129
[1] Xiang Zhang [email protected] http://cse.seu.edu.cn/PersonalPage/x.zhang/ From Web Search to Semantic Search

From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

  • Upload
    doliem

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Page 1: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[1]

X i a n g Z h a n g

x . z h a ng @seu . e d u . c n

h t t p : / / c s e . s e u . e d u . c n / Pe r s ona l Pa g e / x . z h a ng /

From Web Search to Semantic Search

Page 2: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[2]

Content 2

The history of Web and Search

Two faces of Semantic Web

Falcons

Watson

Wolfram Alpha

Knowledge Graph

Page 3: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[3]

As We May Think

"A memex is a device in which an

individual stores all his books,

records, and communications, and

which is mechanized so that it may

be consulted with exceeding speed

and flexibility. It is an enlarged

intimate supplement to his memory."

3

Page 4: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[4]

As We May Think

“Our wearable data collection system lets users collect their experiences into a

continually growing and adapting multimedia diary. The system—called

inSense—uses the patterns in sensor readings from a camera, microphone,

and accelerometers to classify the user's activities and automatically collect

multimedia clips when the user is in an "interesting" situation.”

4

Page 5: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[5]

As We May Think

Memex is internally complex in mechenics

dry photography, microphotography…

photocells, thermionic tubes, cathode ray tubes…

Memex is externally simple for human usage

associative indexing

tying two items together and name it

linking enormous items to form a trail and name it

5

Page 6: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[6]

Building Internet in the Cold War 6

Page 7: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[7]

Building Internet in the Cold War

The idea of package switching (early 1960s)

APARNet project launched (1960s) – US Gov

robust / fault-tolerant / distributed

APARNet with 4 nodes(Univ. Nodes) (1969)

The idea of OSI (Open System Interconnection) (1970s)

TCP/IP (1974-1982)

NSFNet (National Science Foundation) (1986)

APARNet closed (1990)

>1 million hosts in Internet (1992)

7

Page 8: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[8]

Building Internet in the Cold War

why Internet becomes a successful creation

divide-and-conquer

package switching

abstraction

OSI

Scalability / reliability > speed / efficiency

TCP / IP

8

Page 9: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[9]

Ted Nelson and His Hypertext

I don’t like Tim Berners-Lee and his

WWW. Too Simple!!!

“HTML is precisely what we were

trying to PREVENT— ever-breaking

links, links going outward only,

quotes you can't follow to their

origins, no version management, no

rights management.

9

Page 10: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[10]

Ted Nelson and His Hypertext

his first job is a photographer and film editor

“Hypertext” is corned by him in 1963

based on files

data was stored once

no deletion

information was accessible by a link from anywhere

10

Page 11: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[11]

Ted Nelson and His Hypertext

other achievement of Ted

Project Xanadu (1960)

found HTTP Company

produced the first bag for the first laptop

now producing 47% bags for laptop on the world

11

Page 12: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[12]

Doug Engelbart and His…(in 1968) 12

Page 13: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[13]

Doug Engelbart

other achievement of Doug (Tuning Award)

mouse (commercially implemented in LISA 1983, anybody

knows LISA??)

Windows

shared-screen teleconferencing

hypermedia

groupware

13

Page 14: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[14]

LISA是史蒂夫乔布斯女儿的

名字,也被乔布斯用于命名图形界面计算机和其上的操作系统。 没有Lisa就没有Macintosh,在Mac的开发

早期,很多系统软件都是在Lisa上设计的。她具有16位CPU,鼠标,硬盘,以及支

持图形用户界面和多任务的操作系统。并且随机捆绑了7个商用软件。LISA电脑售价$9,935。

14

Page 15: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[15]

Windows 1.0

(1985)

Windows 3.1 (1992)

15

Page 16: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[16]

Ted and Doug

they are both really avant-garde

every technology was in place in 1970s

but it took the PC revolution and widespread

internet to inspire the WWW

the process lasted for almost 20 years!

16

Page 17: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[17]

Apple II and 中华学习机

17

Page 18: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[18]

Sir Tim Berners-Lee and His Web Dream

18

Page 19: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[19]

In March 1989, Tim Berners-Lee

submitted a proposal for an

information management system to

his boss, Mike Sendall. ‘Vague, but

exciting’, were the words that

Sendall wrote on the proposal,

allowing Berners-Lee to continue.

http://info.cern.ch/Proposal.html

19

Page 20: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[20]

Tim Berners-Lee's original World Wide Web browser: 1990

20

Page 21: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[21]

Tim and His Web Dream

Why we use HTML for hypertext representation?

HTML vs. XML

21

Page 22: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[22]

Tim and His Web Dream

HTTP merits

quite simple (get or post)

flexible (content-type)

weak coupling (no connection / no state)

22

Page 23: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[23]

TED Talk: Tim Berners-Lee : The Magna Carta for the Web. (2014)

http://www.ted.com/talks/tim_berners_lee_a_magna_carta_for_the_web

23

Page 24: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[24]

CCTV

Documentary

互联网时代

The Internet Age (2014)

https://movie.douban.co

m/subject/25772505/

24

Page 25: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[25]

Semantic Web: A Glance

25 http://ws.nju.edu.cn/falcons

Page 26: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[26]

Architecture of Semantic Web 26

Page 27: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[27]

Why Semantic Web?

Web Intelligence

Machine-readable

Reasoning

Web of Data

Identification and Interlinks

Programmable Web

27

Page 28: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[28]

Ontology and RDF 28

Page 29: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[29]

Instantiation of Ontology 29

Page 30: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[30] 30

Page 31: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[31]

XML Representation of Ontology 31

Page 32: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[32]

The Logic Face of Semantic Web 32

Page 33: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[33]

Description Logic and Reasoning

33

Page 34: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[34]

The Data Face of Semantic Web

34

Page 35: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[35]

Applications of Semantic Web

Biomedicine

Transportation Engineering

Homeland Security

Software Design

Travel Planning

Job Finding

Online Dating…

35

Page 36: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[36]

TED Talk: Tim Berners-Lee : The Next Web. (2009)

http://www.ted.com/talks/tim_berners_lee_on_the_next_web

36

Page 37: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[37]

TED Talk: Tim Berners-Lee : The year open data went worldwide. (2010) http://www.ted.com/talks/tim_berners_lee_the_year_open_data_went_worldwide

37

Page 38: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[38]

Semantic Web in a Nutshell

Similar but different with

Relational Database

Object-oriented System

Shifted from logic to data

To keep it simple

To grow into large scale

38

Page 39: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[39]

Web Search

1993: Web Crawler

1994: Yahoo

1994: Lycos

1995: Altavista

1998: Google

39

Page 40: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[40]

Page 41: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[41]

Book: How Google Works

https://book.douban.com/subject/26008422/

Page 42: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[42]

Google

Larry Page and Sergey Brin from Stanford

From “What Box”to 10100

Refused by InfoSeek

100,000 dollars from Sun

Yahoo, Excite.com, InfoSeek, Lycos

Marissa Mayer

Went to Yahoo in 2012

42

Page 43: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[43]

The two co-founders

of Google working in

the garage with a

monthly rent of $1700

Page 44: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[44]

Google Search Stats

search per second: 2.3M

Unique searchers per month: 1.17B

Number of pages indexed: 60T

Lines of codes for all Google services: 2B

Alphabet market capitalization: $570B

http://expandedramblings.com/index.php/by-the-numbers-a-gigantic-list-

of-google-stats-and-facts/

Page 45: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[45]

More about Google

Pagerank

Avg. 0.25 response time

45

Google-designed Server

Servers in Container

Page 47: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[47]

Google to Alphabet 47

Page 48: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[48]

23andme.com

By Sergey Brin`s wife, Anne Wojcicki

DNA analysis for $99 ($100M in 2000)

Angelina to avoid her breast cancer

An introduction: http://36kr.com/p/5035340.html

48

Page 49: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[49]

Web Search

Challenge of Web Search

Distributed Data

Volatile Data – changed or dead data

Large Volume

Unstructured and Redundant Data

Data Quality

Data Heterogeneous

49

Page 50: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[50]

Semantic Search

What is Semantic Search

Natural Language or Pseudo-natural Language search

Finding answers (data, facts, information, knowledge)

Exploiting domain knowledge to process search requests

50

Page 51: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[51]

Semantic Search

Different with Web Search

RDF Graphs vs. Bag of Words

Ontology, RDF document, Entity(object) vs. Pages

Ranking

Summarization / Snippet / Recommendation

Similar with Web Search

Keyword-based

51

Page 52: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[52]

Semantic Search 52

Tim

Jack

Mary

Southeast University

Nanjing

hasWife

worksAt

worksAt knows

located

Tim.foaf.xml

What can we get when searching “tim”

Page 53: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[53]

More examples: Who is Bob Marley? From Falcons

Page 54: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[54]

More examples: Who is Bob Marley? From Falcons

Page 55: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[55]

More examples: Who is Bob Marley? From Google and Baidu

Page 56: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[56]

Bob Marley:

No Woman, No Cry

Bob Marley:

Three Little Birds

Page 57: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[57]

More examples: Who is the tallest player in NBA ? From Falcons

Page 58: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[58]

More examples: Who is the tallest player in NBA ? From Google and Baidu

Page 59: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[59] 59

Page 60: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[60] 60

Page 61: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[61] 61

More examples: Who is [email protected]? He knows who? From Falcons

Page 62: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[62] 62

More examples: Who knows Chris Bizer? From Falcons

Page 63: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[63]

Semantic Search

2003: TAP (Guha, McCool, & Miller, 2003)

Earliest semantic web search engine

Keyword query

Return an object and its surrounding subgraph using

label information

Selection is based on popularity, user profile, and search

context

63

Page 64: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[64]

Semantic Search

2005: Swoogle (Ding et al., 2005)

One of the most popular semantic web search engine

Class/property search and ontology search

Pagerank-like ranking algorithm

Provides statistical metadata of results

64

Page 65: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[65] 65

Page 66: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[66] 66

Ontology Search of Swoogle

Page 67: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[67] 67

Document Search of Swoogle

Page 68: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[68]

Term Search of Swoogle

Page 69: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[69] 69

Metadata of foaf:knows in Swoogle

Page 70: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[70]

Semantic Search

2007: SWSE (Harth et al., 2007)

Keyword-based

Pagerank-liked ranking

Combining RDF graph ranks with data source ranks

Filtering results by specifying a class

70

Page 71: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[71]

Semantic Search

2007: Watson (d’Aquin et al., 2007)

Organize resulting objects by documents / ontologies

User can specify the searching scope

Local name

Labels

All literals

71

Page 72: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[72]

Searching Steve Jobs in Watson

Page 73: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[73]

Another Watson

An open-domain QA system

IBM DeepQA Project

MIT: Adaptive View-Based Appearance Model

UT: Reasoning and General Knowledge

USC: Information Extraction and Analysis

RPI: Virtualization Tools

University at Albany: Assurance of capacity for massive system

University of Trento: Self-study, Conversation

University of Massachusetts: Information Retrieval

Carnegie Mellon: QA basic algorithms

73

Page 74: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[74]

How Watson Works

Page 75: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[75]

Another Watson

Feature:

Unstructured text;

5 secs into more than 200 million pages;

Understanding human language, subtle meaning,

paddles, humor, satire…

Determing the certainty of the answer

No help from the WWW and engineers

Beat ken and Brad on Jeopardy in 2011;

75

Page 76: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[76]

Jeopardy 2011

Page 77: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[77]

Ken’s Talk in TEDTalk

Page 78: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[78]

Another Watson

Future

Business Decision-Making

Medical Treatment

78

Page 79: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[79]

Semantic Search

2008: Sindice (Oren et al., 2008)

Property-value pair look-up knowing a property of an

object

Keyword-based RDF document and Microformat search

Not focused on linked objects

79

Page 80: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[80]

Searching documents in Sindice

Page 81: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[81]

Searching documents

using property-value pair

in Sindice

Page 82: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[82]

82

SPARQL Query in Sindice

Page 83: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[83]

Semantic Search

2009: Falcons (Cheng et al., 2009 )

http://iws.seu.edu.cn/services/falcons/

Searching entities / linked objects / RDF documents

Keyword-based

Searching based on Virtual Document

Ranking results based on relevance and popularity

Class-based query refinement

83

Page 84: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[84]

Falcons – Scenario 1 84

“Who is the demo chair of ISWC 2008?”

query with “ISWC2008” AND “demo chair”

Page 85: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[85]

Falcons – Scenario 2 85

“Find relations between

Chris Bizer and Tom Heath”

query with “Chris Bizer”

AND “Tom Heath””

Page 86: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[86]

Falcons – Scenario 2

86

After type

refinement

Page 87: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[87]

Falcons – System Architecture

87

Page 88: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[88]

Falcons- Data Crawling

Seeding

Searching filetype:rdf / filetype:owl in Google with keywords

from Open Directory Project

Sampling from pingthesemanticweb.com / schemaweb.info / …

Sampling from Linked Open Data Project

Crawling

300,000 RDF documents per day

20M RDF documents total (55% well-formed) on August 2008

600M quadruples

88

Page 89: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[89]

Falcons – Data Crawling 89

The distribution of

the number of RDF

documents on pay-

level domains

Page 90: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[90]

Falcons – Constructing Virtual Documents

Virtual Documents of an object

local name

rdfs:label

rdfs:comment

other annotations

Virtual documents of neighbors

90

Page 91: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[91]

Falcons – Virtual Documents 91

VD(eg1:Journal) = ”journal” + ”article” + ”magazine”

Page 92: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[92]

Falcons – Ranking Objects

Query Relevance

Cosine similarity between query and virtual document

Popularity

The occurrence of the object in the document set

92

Page 93: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[93]

Falcons – Other Features

Query-relevant Snippet Generation

PD-Thread

Query Refinement with Class Hierarchies

Filtering the resulting objects with class Restrictions

Discovering implicit typing information by class-inclusion

reasoning

Recommending subclasses for incremental refinement

93

Page 94: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[94]

Wolfram Alpha

www.wolframalpha.com

Computational Knowledge Engine

2009.5.18

Wolfram Alpha: Semantic Search Is

Born – InfoToday

The Knowledge engine behind Siri

94

Page 95: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[95]

Wolfram Alpha

Chat with Siri:

Somebody:我的孩子总喜欢缠着我问我离圣诞节还有几天。

Siri: 让我查查……稍等……我为你找到了答案。

Siri:82 天,也就是 2 个月又 21 天,也就是 11 周零 5 天,

也就是 58 个工作日,也就是 0.22 年。

How Siri knows all these?

95

Page 96: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[96]

Page 97: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[97]

Page 98: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[98]

Page 99: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[99]

Page 100: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[10

0]

Page 101: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[10

1]

Page 102: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[10

2]

Page 103: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[10

3]

Page 104: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[10

4]

Page 105: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[10

5]

Page 106: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[10

6]

106

Page 107: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[10

7]

Page 108: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[10

8]

108

Page 109: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[10

9]

Page 110: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[110]

Wolfram Alpha

WA is computation + knowledge

Holding a vast of knowledge-bases

Using computational linguistics

Steven Wolfram was a quantum physicist!

First version of WA

15,000,000 lines of Mathmatica codes

10,000 CPU

The Technology behind Wolfram|Alpha

110

Page 111: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[111

]

Stephen Wolfram:计算万物的理论

Page 112: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[112]

Knowledge Graph 知识图谱

“The Knowledge Graph is a knowledge

base used by Google to enhance its

search engine's search results with

semantic-search information gathered

from a wide variety of sources.”

– From Wikipedia

112

Page 113: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[11

3]

113

Page 114: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[11

4]

Page 115: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[11

5]

Introducing to Knowledge Graph

Page 116: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[116]

Satori

Adopted for Bing.com by Microsoft

From 2013.12.12 TED讲座:当用户搜索某一个人名,如果这个人曾在TED做过演讲,那么用户将可以直接在搜索结果的

Snapshot pane中播放TED视频。

著名演讲及国歌:不仅仅是TED,其他一些著名人物的演讲也将会直接呈现在搜索结果中,另外也包括

国歌这样的内容,用户同意可以直接在Snapshot pane点击播放。

在线课程:搜索一所大学名称,Bing会在搜索结果中直接展现该学校的热门的在线课程。

科学知识:当用户搜索某个科学名词时,Bing将会突出展示来自维基百科的该词条基本内容。

历史事件:Bing会直接提供历史事件的概要预览,包括简单描述、开始时间、结束时间等。

相关人员:相关人员功能将会在用户搜索某一事件或事物时提供,并会给出此人与用户搜索词语相关的

原因。

动物种类:当用户搜索某个动物时,Bing将会提供更详细的相关品种让用户能够进一步进行查询。比如

搜“老虎”,Bing将会在Snapshot pane展现孟加拉虎、东北虎等种类,用户可以进一步进行搜索

116

Page 117: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[117]

Satori 117

Page 118: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[118]

Graph DB

The fundamental component of most KG

Graph Storage

CRUD

Create a graph

Read a graph

Update a graph

Delete a graph

118

Page 119: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[119]

Graph DB

Recommendation:

Neo4j

Virtuoso

119

Page 122: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[122]

Wikipedia 122

Page 123: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[123]

DBpedia

Extracted from Wikipedia

RDF / SPARQL

125 languages, totally >3 billion triples, all freely downloadable

For English version, 4.58M entities:

1,445,000 Persons

735,000 Places

123,000 Music Albums

87,000 Movies

241,000 Species

6,000 Diseases

123

Page 124: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[124]

YAGO

Wikipedia+WordNet+GeoNames

10M entities + 120M facts

Features

Manually evaluated the accuracy of

95%, every relation is annotated with its

confidence value

Combining the taxonomy of wiki and

WordNet

With temporal and spacial dimension

124

Page 125: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[125]

Freebase

Knowledge Graph behind Google

A Structured Wikipedia

Crowdsourcing

125

Page 126: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[126]

WordNet and BabelNet

126

Page 127: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[127]

WordNet and BabelNet 127

BabelNet = Wikipedia + WordNet

Page 128: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[12

8]

Page 129: From Web Search to Semantic Search - cse.seu.edu.cncse.seu.edu.cn/PersonalPage/x.zhang/web_science/web_science.pdf · data was stored once ... Provides statistical metadata of results

[129]

Thanks http://cse.seu.edu.cn/PersonalPage/x.zhang/teaching.html

129