36
Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University, Seoul, Korea

Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Embed Size (px)

Citation preview

Page 1: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Knowledge Base Building Project7th meeting

2008. 09. 21

Intelligent Database Systems LabSchool of Computer Science & Engineering

Seoul National University, Seoul, Korea

Page 2: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Role

Role of K-Base Project

PPS ProductOntology

Knowledge Base

CollectiveIntelligence

ProductClassification

DataNavigation

ProductManager

General User

DataInference

Outer Knowledge Resource

RequestedKnowledge

ProductInformation

Service Layer

Data Layer

Page 3: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Goals

Goals of Project

target users

– Product Manager : “PRODUCT ENCYCLOPEDIA”

Managing and providing general product information

Navigating product relationships with some conditions

Matching product information with existing standard classification scheme

Classifying new products into product database

Extracting the product information from outer resource (e.g. PPS on-tology)

– General User : “INFORMATION MAP”

Extracting the general information from outer resource (e.g. Wikipedia)

Storing general information (e.g. Documents) by collective intelli-gence

Visualizing the information connection

Linking the knowledge with user-provided properties and semantics

Page 4: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Coverage of Data Source

1st phase : product information

General product information : PPS Ontology

– Product, category, attribute, UOM

Standard classification scheme : G2B, UNSPSC

– Segment, Family, Class, Commodity

– 현재까지는 UNSPSC 이외에 다른 usable classification scheme source를 찾지 못함

2nd phase : web resource from outer service

Wikipedia, Freebase, Upper ontology, 인명 검색 (DBLP)

3rd phase : collective intelligence

User-defined general information, properties and semantics

Page 5: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Scenario

For product manager

상품 정보 입력– 상품에 대한 detail 한 정보 입력

– 상품 설명에 필요한 attribute 입력

상품 정보 검색– 상품 정보가 포함하고 있는 field 에 대한 keyword 검색

– 상품과 연관된 속성에 대한 keyword 검색 및 확장 검색

상품 정보 Navigation

– 두 개 이상의 상품 정보에 대한 속성 – 속성값 비교

– 상품 – 속성 연관 관계를 이용한 Graph 기반 Navigation

상품 정보 분류– 기존에 존재하는 상품에 대해 여러 가지의 standard classification 과 map-

ping

– 상품 – 분류 연관 관계를 이용해 서로 다른 classification scheme 의 항목을 확률적으로 matching

Page 6: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Scenario (cont’d)

For general user

지식 정보 입력– 지식 정보에 대한 detail 입력

– 지식 정보에 필요한 property 입력

지식 정보 검색– 지식 정보가 가지고 있는 detail 에 대한 keyword 검색

– 지식 – 지식 간의 연관 관계를 이용한 information map navigate

– 지식 속성 및 속성값 , 속성과 연관된 지식에 대한 keyword 검색

– 추론을 통한 연관 지식 검색

지식 분류– 지식과 연관될 수 있는 분류를 기존 분류 체계와 연결

기타– 지식 data export / import

– 지식 데이터를 structured data file 형태로 추출

Page 7: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Function Overview

KBSystem

Storage

Data

API

View

Data Exchange

ServiceMediation

Pivot

Product

Attribute

UOM

Display

Physical IO

Sort

QueryBuilding

Structuring

Graph

Navigation

Add

Delete

Modify

FormatConvert

ServiceManagement

PageSelect

UpdateAnalysis

Log Analysis

Retrieve

Delete

Load Analysis

Monitoring

Validation

Validation

Page 8: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Module overviewUser Interface

View

Navigator

Visualizer

Data Engine

Page Builder

Management

Load Analyzer

Log AnalyzerMonitoring

Tool

API

FormatConverter

ServiceMediator

Logger

Storage

KB

DBMS

Log

User management Module

SessionManager

HistoryManager

Personalization

PermissionManager

Web Server

External Service

Data FilterQueryBuilder

Editor

Page 9: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Functions

Target Data

Product

Attribute

UOM

Class

Classification

Relation

Log

User

Action

Add

Modify

Delete

Page 10: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Functions - Storage

이름 기능 대상 데이터

get_Product Product 하나를 읽어온다 Product

get_Attribute Attribute 하나를 읽어온다 Attribute

get_UOM UOM 하나를 읽어온다 UOM

get_Rel_Prod_Attr Product 와 연관있는 Attribute 를 다 읽어온다

Product, Attribute

get_Rel_Attr_UOM Attribute 와 연관있는 UOM 을 다 읽어온다

Attribute, UOM

get_Rel_Prod_g2b Product 와 연관있는 g2b 분류를 읽어온다

Product, g2b Class

get_Rel_Prod_Prod Product 와 연관있는 Product 을 다 읽어온다

Product

조회

Page 11: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Functions - Storage

이름 기능 대상 데이터

mod_Product Product 하나를 바꾼다 Product

mod_Attribute Attribute 하나를 바꾼다 Attribute

mod_UOM UOM 하나를 바꾼다 UOM

mod_Rel_Prod_Attr Product 와 연관있는 Attribute 를 바꾼다

Product, Attribute

mod_Rel_Attr_UOM Attribute 와 연관있는 UOM 을 바꾼다

Attribute, UOM

mod_Rel_Prod_g2b Product 와 연관있는 g2b 분류를 바꾼다

Product, g2b Class

mod_Rel_Prod_Prod Product 와 연관있는 Product 를 바꾼다

Product

수정

Page 12: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Functions - Storage

이름 기능 대상 데이터

del_Product Product 하나를 삭제한다 Product

del_Attribute Attribute 하나를 삭제한다 Attribute

del_UOM UOM 하나를 삭제한다 UOM

del_Rel_Prod_Attr Product 와 연관있는 Attribute 를 삭제한다

Product, Attribute

del_Rel_Attr_UOM Attribute 와 연관있는 UOM 의 연관 정보를 삭제한다

Attribute, UOM

del_Rel_Prod_g2b Product 와 연관있는 g2b 분류의 연관 정보를 삭제한다

Product, g2b Class

del_Rel_Prod_Prod Product 와 연관있는 Product 의 연관 정보를 삭제한다

Product

삭제

Page 13: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Functions - View

이름 기능 대상 데이터

pivot_Prod Product 를 기준으로 View 를 재구성한다

pivot_Attr Attribute 를 기준으로 View 를 재구성한다

pivot_rel Relation 정보를 기준으로 View 를 재구성한다

getinfo 조회를 원하는 대상의 정보를 Stor-age 로부터 읽어 온다

sort_Prod Product 를 asc, desc 순으로 소트한다

Product

sort_Attr Attribute 를 asc, desc 순으로 소트한다

Attribute

Page 14: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Milestones

Important milestone (9/22~)

Date Milestone

9/27• Project Specification (Role, Goals, Coverage, Scenario, Function, Module, Framework)

10/10• Project Documentation• Initial Data Model Building

10/30• Module Designing 와 - UML Diagram, Process and Data Flow Chart• Initial Data Crawling

11/20 • Module Developing – 중간 점검

12/10• Module Developing – 최종 점검• Service Building (Module Connecting)

12/15 • Debugging and Testing

Page 15: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Issues

Development

Using open sources : MySQL

Resource

당장 cover 할 수 있는 표준 분류 체계 ?

기타 다른 분류 체계를 얻을 수 있는지 문의

Concrete scope of information

Design

설계 문서 재작성 (functionality 를 참고하여 )

– Class diagram, process and data flow chart

– Function and module list

Page 16: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Appendix

Page 17: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Scope of information

What kind of information we have to handle? Basic information source

– Product from PPSONTO

Additional information source for general information

– Upper Ontology

Page 18: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Available Information Source

SUMO

Yago Ontology

GoodRelation

DBpedia

Page 19: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Suggested Upper Merged Ontology (SUMO)

• Defines a hierarchy of SUMO classes and related rules and relationships

• Mapped by hand to all of WordNet synsets

• Formulated in a version of the language SUO-KIF which has a LISP-like syntax

• Formally defined, not dependent on a particular implementation

• Organized for interoperability of automated reason-ing engines

Page 20: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

SUMO Structure

Structural Ontology

Base Ontology

Set/Class Theory Numeric Temporal Mereotopology

Graph Measure Processes Objects

Qualitieshttp://www.ontologyportal.org/

Page 21: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

SUMO hierarchy

Entity

Physical Abstract

Object Process

SelfCon-nectedObject

Region

Collection

DualObjectProcess

InternalChange

ShapeChange

...

SetOrClass

Re-lation

Quantity

Attribute

...

Page 22: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Page 23: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Page 24: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

YAGO

A Core of Semantic Knowledge Unifying Word-Net and Wikipedia

A light-weight and extensible ontology with high coverage and quality

Enable to express relations between facts and relations

Page 25: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

YAGO (cont.)

All objects are represented as entities e.g. Numbers, strings, other literals, and even URLs

Similar entities are grouped into classes Each entity is an instance of at least one class

Classes and relations are entities as well

Page 26: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

YAGO (Cont.)

Where do YAGO get the ontology from?

Previous approaches

Assemble the ontology manually

(WordNet, SUMO, GeneOntology)

Problems: Usually low coverage

YAGO approach

Assemble the ontology from Wikipedia (=> good coverage)

Use the category system of Wikipedia (=> good accuracy)

Page 27: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

YAGO ontology

1935born

American_singer

is a

Singer#1

Person#3

subclass

subclass

"singer"

means

"Elvis Presley"

Page 28: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

YAGO ontology

SubClassOf relation

Exploit the category system of Wikipedia

Use WordNet to establish the hierarchy of classes

Means relation

Exploiting WordNet Synsets

– e.g. (”metropolis”, means, city))

Exploiting Wikipedia Redirects

– e.g. (”Einstein,Albert”, means, Albert Einstein)

Page 29: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

DBpedia

Extract structured information from Wikipedia

Make sophisticated queries against Wikipedia

Use the RDF as a flexible data model

Interlinked on RDF level with various other Open Data datasets on the Web

Page 30: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

DBpedia (Cont.)

Page 31: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Required Information

Attribute (Property) Description

Attribute ID 속성 식별자

Attribute Name 속성 이름

Attribute Description 속성에 대한 자세한 설명

Attribute Value Type 속성값 형태

Attribute Max Value 속성의 최대값

Attribute Min Value 속성의 최소값

Attribute Type 속성 형태

Attribute Group ID 속성 집단 식별자

Attribute Group Name 속성 집단 이름

Attribute Group Descrip-tion

속성 집단에 대한 자세한 설명

Page 32: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Required Information

Classification Description

Classification ID 분류체계 식별자

Classification Name 분류체계 이름

Classification Description 분류체계에 대한 자세한 설명

• Classification 은 여러 종류가 있을 수 있다 .

• G2B 분류• 군급 분류• UNSPSC• E-OTD• …

Page 33: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Required Information

Product Description

Product ID 상품 식별자

Product Company Name 상품 회사 이름

Product Company Regis-tration Number

상품 회사 등록 번호

Product Name 상품 이름

Product Model Number 상품 모델 번호

Product Classification ID 상품이 어느 분류에 속하는지 나타냄

Product Type 상품의 유형 , 형태

Product Keyword 상품을 대표하는 키워드

Page 34: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

Required Information

Product Description

Product Brand 상품 브랜드

Product UOM 상품 측정단위

Product Registration Date 상품 등록 날짜

Product Description 상품에 대한 자세한 설명

Product Image 상품의 실제 모습에 대한 이미지

Related Product ID 연관 상품의 식별자

Related Product Relation Type

연관 상품과의 관계 유형• Related Product Relation Type

• 대체상품 : 비슷하거나 같은 부류의 다른 상품 ( 예 : 꿀 - 설탕 )• 보완상품 : 함께 소비할 때 시너지 효과가 나오는 상품 ( 예 : 커피 – 설탕 )• 부품 : 한 상품의 구성요소가 되는 상품 ( 예 : 컴퓨터 -CPU, 자동차 - 타이어 )• 그 밖에 …

Page 35: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

LinkingOpenData

• Goal• To extend the Web with a data commons by publishing

various open datasets as RDF on the Web and by setting RDF links between data items from different data sources.

• The basic principle of Linked Data• Use the RDF data model to publish structured data on

the Web• Use RDF links to interlink data from different data

sources

• Project homepage link• http://esw.w3.org/topic/SweoIG/TaskForces/Comm

unityProjects/LinkingOpenData

Page 36: Knowledge Base Building Project 7 th meeting 2008. 09. 21 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,

Copyright 2008 by CEBT

LinkingOpenData

• The datasets consist of over two billion RDF triples, which are interlinked by around 3 million RDF links