Machine Learning and Knowledge Discovery for Semantic...

Preview:

Citation preview

ailab.ijs.si

Machine Learning and Knowledge Discovery for Semantic Web

Dunja MladenićArtificial Intelligence Laboratory,

J. Stefan Institute,

Slovenia

ailab.ijs.si

Jožef Stefan Institute, Artificial Intelligence Laboratory

Selection of FP6 & FP7 Projects (Integrated Projects and Networks of Excellence only):

FP7 IP ACTIVE – Enabling the Knowledge Powered Enterprise

FP7 IP COIN – COllaboration and INteroperability for networked enterprises

FP7 IP EURIDICE – Inter-Disciplinary Research on Intelligent Cargo for Efficient, Safe and Environment-friendly Logistics

FP7 NoE PASCAL2 – Pattern Analysis, Statistical Modeling and Computational Learning

FP7 NoE MetaNet – Machine Translation & Multilingual Information Retrieval

FP7 NoE Multilingual Web

FP6 IP NeOn – Lifecycle Support for Networked Ontologies

FP6 IP ECOLEAD – European Collaborative Networked Organizations Leadership Initiative

FP6 IP SEKT – Semantically-Enabled Knowledge Technologies

Jozef Stefan Institute (JSI) is the leading Slovene research institution for natural sciences (900+ people)

in the areas of computer science, physics, chemistry

Artificial Intelligence Laboratory has over 30 people working in various areas of artificial intelligence(machine learning, data mining, semantic technologies, computational linguistics, logic)

Spinoff-s: Quintlligence, Cyc-Europe, LiveNetLife, ModroOko, Envigence

Selection of Portals and Products:

Text-Garden (http://www.textmining.net)

Enrycher (http://enrycher.ijs.si/)

VideoLectures.NET (http://videolectures.net/)

IST-World (http://www.ist-world.org/)

Project Intelligence (http://pi.ijs.si/)

Search-Point (http://searchpoint.ijs.si/)

OntoGen (http://ontogen.ijs.si/)

Document-Atlas (http://docatlas.ijs.si/)

AnswerArt (http://answerart.net/)

Contextify (http://contextify.net/)

Document-Atlas

VideoLectures.NET

Business Clients: Accenture Labs, Bloomberg, British Telecom, Google Labs, Microsoft Research, New York Times, Siemens, Wikipedia

Academic Partners: Carnegie Mellon, Cornel, Stanford, MIT, Uni. Maryland, KIT, UCL

Enrycher IST-WorldSearchPoint

OntoGen AnswerArt Contextify e-mails

ailab.ijs.si

AILabTechnologies

Graph/Social Network Analysis

(GraphGarden/SNAP, IST-World,

FPIntelligence)

Complex Data Visualization

(DocAtlas, NewsExplorer, SearchPoint)

Computational Linguistics

(Enrycher, AnswerArt)

Social Computing/Web2.0 (LiveNetLife)

Light-Weight Semantic Technologies

(OntoGen, Contextify)

Deep Semantics & Reasoning (Cyc)

Statistical Machine Learning

Data/Web/Text/Stream-Mining

(TextGarden Suite of tools)

ailab.ijs.si

Outline

Motivation

Machine Learning and Ontologies

OntoGen

OntoPlus

Semantics for search and browsing

SearchPoint

AnswerArt

Enrycher

Sensor Search

Real-time data processing

NYTMiner, BBMiner, Personalized News Search

…to conclude

ailab.ijs.si

Motivation

Semantic Web

integrates many existing ideas and technologies focusing on

upgrading the existing nature of web-based information

systems to a more “semantic” oriented nature

typical approach is top-down modeling of knowledge and

proceeding down towards the data

Machine Learning and Knowledge Discovery in

Databases

aims at data modeling and extraction of interesting (non-

trivial, implicit, previously unknown and potentially useful)

information from large datasets

data-driven bottom-up approach trying to discover the

structure in the data and express it in the more abstract ways

and rich knowledge formalisms

ailab.ijs.si

ML & KDD role within Semantic WebOntology construction

SW applications involve deep structured knowledge composed into ontologies

ML/KDD discovering structure in the data - structuring knowledge

semi-automatically extract knowledge from data into ontological structure

Integrating domain knowledgeML/KDD approaches, e.g., “Active Learning” and “Semi-supervised Learning” make use of small pieces of human knowledge for better guidance towards the desired model (e.g., ontology)

reduce human efforts by an order of magnitude preserving the quality of results

Handling data over time - dynamic ontologiesdata and the corresponding semantic structures change in time

KDD technologies for stream mining - deal with the stream of incoming data fast enough to be up-to-date with the corresponding models (ontologies)

Supporting different data modalitiesML/KDD technologies are not limited to a specific data representation -handling different data modalities (databases, text, multimedia, graphs)

ML/KDD for Language Technologies SW mainly deals with textual data, LT are thus important for SW including lexical, syntactical and semantic levels of natural language processing

ML/KDD for modeling natural language by automatic learning from rare/costly data

Scalability KDD approaches consider scalability

SW is ultimately concerned with real-life data on the web which have exponential growth

ailab.ijs.si

Ontology - SW commonly uses ontologies to structure knowledge

Ontology can be seen as a graph/network

structure consisting from:

a set of concepts (vertices in a graph),

a set of relationships connecting concepts

(directed edges in a graph),

a set of instances assigned to a particular

concepts (data records assigned to vertices in

a graph)

ailab.ijs.si

Ontology construction

One of the methodologies defined for ontology construction is a methodology for semi-automatic ontology constructionanalogous to the CRISP-DM methodology can be defined as consisting of the following interrelated phases:

1. domain understanding (what is the area we are dealing with?),

2. data understanding (what is the available data and its relation to semi-automatic ontology construction?),

3. task definition (based on the available data and its properties, define task(s) to be addressed),

4. ontology learning (semi-automated process addressing the task(s)

5. ontology evaluation (estimate quality of the solutions to the addressed task(s)),

6. refinement with human in the loop (perform any transformation needed to improve the ontology and return to any of the previous steps, as desired)

[Grobelnik & Mladenić 2006]

ailab.ijs.si

ML/KDD for ontology learning

Define the ontology learning tasks in terms of mappings between ontology components, where some of the components are given and some are missing and we want to induce the missing ones.

Some typical scenarios in ontology learning are the following:

Inducing concepts/clustering of instances (given instances)

Inducing relations (given concepts and the associated instances)

Ontology population (given an ontology and relevant, but not associated instances)

Ontology generation (given instances and any other background information)

Ontology updating/extending (given an ontology and background information, such as, new instances or the ontology usage patterns)

ailab.ijs.si

Ontology Population via document classification into topic ontology

Goal: given a collection of documents organized into a topic ontology, classify a new document into the ontology

Different classification algorithms were applied on different data representations (e.g., word-vectors, word n-gram vectors, flexible phrase vectors)

on different datasets (e.g., Yahoo! directory of Web pages, US patent database, Directory of Slovenia/Croatian Web pages, News directory)

ailab.ijs.si

OntoClassify

System for scalable classification of text into large

topic ontologies [Grobelnik & Mladenić, 2005]

Available as Web service

for DMoz directory of Web pages

for Inspec ontology for annotating papers

for Mesh medical ontology

ailab.ijs.si

Constructing ontology from data stream

Goal: given a stream of documents (e.g., news

arriving over time) construct ontology

Solution: Framework that incorporates the stream

mining process into a formal definition of ontology[Grobelnik et al., 2006]

Extract named entities and use them as instances of the ontology

Entities and co-occurring entity pairs are represented by feature

vectors based on the content of the documents they occur in

Concepts and relations can be formed either by clustering or by

classification into an existing topic hierarchy

ailab.ijs.si

Illustrative results on Reuters news

Observe change in relations between entities

over time, e.g.,

France – UK relation focused first on

Society (Society, Government, Regional,...) and later

moves to

Business (Investing, Business, Stocks, Bonds,…);

ailab.ijs.si

Ontology Learning from text

Extending the existing ontologycommonly used is the English lexical ontology WordNet that is extended using some text, eg., Web documents [Agirre et al., 2000]

Learning relations for an existing ontology (from docs)learn relations between the concepts (eg., “isa” [Cimiano et al., 2004], “hasPart” [Maedche, Staab, 2001]), extract semantic relations from text based on collocations [Heyer et al., 2001]

Ontology construction based on clustering (from docs)split each document into sentences, parse the text and apply clustering for semi-automatic construction of an ontology [Bisson et al., 2000; Reinberger et al., 2004]

cluster sentences map them upon the concepts of a general ontology (eg., Wordnet [Hotho et al., 2003])

use whole documents and guiding the user through a semi-automatic process of ontology construction [Fortuna et al., 2005]

ailab.ijs.si

Ontology Learning from text (cont)

Ontology construction based on semantic graphsparse the documents and construct semantic graphs, use it for learning document summaries [Leskovec et al., 2004]

Ontology construction from a collection of news stories

represent news as graphs of named entities with relationships based on collocations, used for visualization/browsing [Grobelnik, Mladenić, 2004]

More information in edited book [Buitelaar et al., 2005]

ailab.ijs.si

SEMI-AUTOMATIC DATA-DRIVEN ONTOLOGY CONSTRUCTION

Blaz Fortuna, Dunja Mladenić, Marko Grobelnik

http://ontogen.ijs.si

ailab.ijs.si

Ontology Learning with OntoGen

Semi-Automaticprovide suggestions and insights into the domain

the user interacts with parameters of methods

final decisions taken by the user

Data-Drivenmost of the aid provided by the system is based on some underlying data

instances are described by features extracted from the data (eg., words-vectors)

Installation package available at ontogen.ijs.si

ailab.ijs.si

Main Features

Interactive user interface

User can interact in real-

time with the integrated

machine learning and text

mining methods

Concept discovery

methods:

Unsupervised

k-means clustering

Latent Semantic

Indexing (LSI)

Supervised

Active learning

Concept visualization

Methods for helping at

understanding the

discovered concepts:

Keyword extraction

TFIDF and SVM-normal

based keyword extraction

Concept visualization

LSI and multi-dimensional

scaling based visualization

Also available as a separate

tool named Document

Atlas:http://docatlas.ijs.si

ailab.ijs.si

Ontology management

Concept hierarchy

List of suggested sub-concepts

Ontology visualization

Selected concept

ailab.ijs.si

Concept management

Concept’s details

Concept’s instance

management

Selected concept

Keywords

Selected instance

ailab.ijs.si

Active Learning for concept learning

SVM hyperplane distance based active learning algorithm

First few labelled documents are bootstrapped from a query search

Instances for final concept are selected using the final SVM model

Query

New Concept

ailab.ijs.si

Reuters news articles used in the upper example with two different

sets of categories: topics or list of countries that appear in the news

articles.

Each set of categories offers a different view on the data.

SVM based method detects importance of keywords for each view.

Multiple views of the same data

Topics

view

Countries

view

UK takeovers and mergers

The following are additions

and deletions to the

takeovers and mergers list

for the week beginning

August 19, as provided by

the Takeover …

Lloyd’s CEO questioned in

recovery suit in U.S.

Ronald Sandler, chief

executive of Lloyd's of

London, on Tuesday

underwent a second day of

court interrogation about …

ailab.ijs.si

Instances are visualized as points on 2D map. The distance between two

instances on the map correspond to their similarity.

Characteristic keywords are shown for all parts of the map.

User can select groups of instances on the map to create sub-concepts.

Concept’s instances visualization

ailab.ijs.si

New documents

Classification of selected document

Selected document

Ontology population

System uses one vs. all linear SVM trained on created ontology to classify new instances into concepts.

Users can finalize the classifications using an interactive user interface

ailab.ijs.si

ONTOGEN ON IMAGES

Nenad Tomašev, Blaz Fortuna, Dunja Mladenić, Marko Grobelnik

ailab.ijs.si

SIFT features

Color

info

Text

Extract

features

Data

Mining

Application

Image representation

ailab.ijs.si

Image representation - features

SIFT features

Rotation, scale and translation invariant orientation

gradients located at “interesting” points on an image

Usually, SIFT feature space is quantized to get

“representative” vectors (“codebook” histogram)

Color histogram

Simply divide the color spectrum into “buckets” and

calculate the distribution of colors into these buckets,

(color histogram)

Distance - weighted sum of SIFT codebook and color data

distances

ailab.ijs.si

OntoGen on ImageNet subset (flowers, fire, buildings)

ailab.ijs.si

Document list for quick overview

ailab.ijs.si

Collection visualization (without displaying images)

ailab.ijs.si

Collection visualization(displaying images)

ailab.ijs.si

Creating ontology on images

Grouping similar images - concepts

Displaying relevant features as concept names

ailab.ijs.si

Sub-concept visualization

flower

buildings

fire

ailab.ijs.si

Adding sub-concepts

ailab.ijs.si

TEXT-DRIVEN ONTOLOGYEXTENSION

Inna Novalija, Dunja Mladenić

ailab.ijs.si

Arc

hit

ectu

re

OntoPlus

OntoPlus methodology

allows for the effective

extension of the very large

ontologies.

OntoPlus methodology

provides the user with

required concepts and

relationships in the form

of the ranked list.

OntoPlus methodology

combines textual ontology

content, ontology structure

and co-occurrence

information.

Domain Subset Extraction Module (DSEM)

Ontology Extension

Module (OEM)

3

4

5

Ontology Extender

Validated Entries:

Glossary Term,

Ontology Concept,

Relation

Candidate Entries:

for Each Glossary Term -

Ranked List of Related

Ontology Concept s and

Correspondent Relations

Suggested

Domain

Knowledge

Extractor

Extraction of

ontology concepts

defined in relevant

domains

Extraction of ontology

concepts with denotation

similar to Glossary Term

names

Extraction of

relevant domains

2 Relevant

Ontology

SubsetUpper-Level

Domain

Extractor

6

Multi-Domain

Ontology

7

Domain KB

Domain Information Module (DIM)

Domain

Keywords

Domain Glossary:

Term Names;

Term Descriptions

1

Domain information

identification

Extraction of the

domain relevant

ontology subsetRelated concepts

extraction

User validation

Ontology reuse

ailab.ijs.si

OntoPlus

Text-Driven Ontology Extension Using Ontology Content,

Structure and Co-occurrence

Ranking existing ontology concepts as corresponding to a new

domain concept suggested for the ontology extension

Experiments using Cyc ontology and textual material from two

domains – Finances and, Fisheries & Aquaculture

Best results by combining content, structure and co-occurrence

information

Financial domain - ontology content and structure

Fisheries & Aquaculture domain - ontology content and co-

occurrence

ailab.ijs.si

Results – Concept Ranking

100 Random Terms

HR (Top 1) HR (Top 5) HR (Top 10)

Weighting Measure Eqv or Hier

Rels

Any

Rels

Eqv or Hier

Rels

Any Rels Eqv or Hier

Rels

Any Rels

Baseline - Name: [1.0] 18 28 24 36 25 40

Content (cos. similarity): [1.0] 32 65 60 92 68 95

Co-occur (Jaccard similarity): [1.0] 30 48 48 62 52 73

Content: [0.5]

Structure: [0.4]

Co-occur: [0.1]

38 68 66 95 76 98

100 Random Terms

HR (Top 1) HR (Top 5) HR (Top 10)

Weighting Measure Eqv or Hier

Rels

Any Rels Eqv or Hier

Rels

Any Rels Eqv or Hier

Rels

Any Rels

Baseline - Name: [1.0] 24 37 25 38 27 40

Content (cos. similarity): [1.0] 32 72 52 88 56 91

Co-occur (Jaccard similarity): [1.0] 33 71 49 89 51 90

Content: [0.5]

Structure: [0.0]

Co-occur: [0.5]

42 84 63 96 66 96

Evaluation of the top suggested candidate concepts for ontology extension

(ASFA thesaurus)

Evaluation of the top suggested candidate concepts for ontology extension

(Financial glossary)

String edit distance of

concept name

Content +

Co-occurrence

Content +

Structure +

Co-occurrence

String edit distance of

concept name

ailab.ijs.si

Demo

ailab.ijs.si

CONTEXT SENSITIVE SEARCH

Boštjan Pajntar, Marko Grobelnik, Dunja Mladenić

http://SearchPoint.ijs.si

ailab.ijs.si

SearchPoint

Search engines generally work very well

There are cases where it is difficult to specify aquery

Idea: help the user by clustering all the hits and visualise the results space

Some related work: mindset.research.yahoo.com – research vs. shopping aspect

www.ujiko.com – clustering & user interface

vivisimo.com – hierarchical clustering

ailab.ijs.si

Approach Description

Search results clustered and shown in 2D space

Each point in this cluster space coresponds to a ranking

Hits are ordered according to the position of the focus -

the selected point

Initial focus position corresponds to Google ranking

Positioning clusters with respect to centroid to centroid

similarity

Calculating ranking of document using its similarity to each

centroid:

Classifiying documents into web directory (DMoz),

visualising relevant parts of the directory

ailab.ijs.si

Search

“Internet search” – one of the

most common tasks involving

text manipulation in everyday

life

…but – how smart is search

technology today?

…not too smart!

It is sophisticated, but not smart

ailab.ijs.si

Example – Searching for “jaguar”

Query “jaguar” has many meanings…

…but the first page of search engines doesn’t provide us with many answers

…there are 84M more results

ailab.ijs.si

Query

Conceptual map

Search Point

Dynamic

contextual

ranking based

on the search

point

Context sensitive search

ailab.ijs.si

SearchPoint

ailab.ijs.si

SearchPoint

ailab.ijs.si

Main advantages

Generated clusters

(in contrast to predefined)

User can search the whole cluster space and is

not forced to select a single cluster

(Computer generated clusters are not necessarily

what user has in mind)

ailab.ijs.si

SearchPoint integrated in Accenture’s intranet search

ailab.ijs.si

ANSWER ART

Luka Bradeško, Lorand Dali, Blaž Fortuna, Marko Grobelnik, Dunja

Mladenić, Inna Novalija, Boštjan Pajntar

http://AnswerArt.net

ailab.ijs.si

TripletsExtendedontology

AnswerArt – System Architecture

AnswerArtpreprocessing

Domain ontology(ASFA, WordNet)

Semantic enhancement

of triplets

AnswerArt

Index

Extraction

Cyc

Question Answer

ailab.ijs.si

AnswerArt using Medline

ailab.ijs.si

Show

document

AnswerArt using Medline

ailab.ijs.si

Show document

overview

ailab.ijs.si

ailab.ijs.si

AnswerArt using ASFA

ailab.ijs.si

AnswerArt using ASFA

Show

document

ailab.ijs.si

AnswerArt using ASFA

Show document

overview

ailab.ijs.si

NATURAL LANGUAGE TEXTENRICHMENT

Tadej Štajner, Delia Rusu, Lorand Dali, Blaž Fortuna,

Dunja Mladenić, Marko Grobelnik

http://enrycher.ijs.si

ailab.ijs.si

Enrycher Service

Annotation Features:

Entity extraction

People, locations, organizations,

dates, percentages and money

amounts

Entity resolution

co-reference

anaphora

Entity linkage to Linked Open

Data (LOD)

Word Sense Disambiguation to

LOD (WordNet 3.0 VUA)

Assertion extraction

Subject – predicate – object sentence

elements together with their modifiers

Categories – from the Open

Directory and the Wikipedia category

schema

ailab.ijs.si

Entity resolution in text

ailab.ijs.si

Enrycher Service Dependencies

The dashed line marks dependencies between components that are optional,

whereas the filled lines mark required dependencies

ailab.ijs.si

A comparative view on five systems: Enrycher, Text Runner, Open Calais, GATE and Read the

Web

Features Enrycher Text Runner Open Calais GATE NELL

Named Entity Extraction

Co-reference and

Anaphora Resolution

Entity resolution

Disambiguation

Assertion Extraction Relationshipextraction

Events andFacts

Relationshipextraction

Categories

Vizualization

RDF Output

Multi-Language Support English English,

French,Spanish

Web Service API

Can work on a singledocument

ailab.ijs.si

Enrycher - demo

ailab.ijs.si

Enrycher - demo

ailab.ijs.si

Enrycher - demo

Entities

Semantic graph

ailab.ijs.si

Enrycher - demo

Entity details

In OpenCyc

Category

ailab.ijs.si

OPINION MINING

Andreea Bizău, Delia Rusu, Dunja Mladenić

ailab.ijs.si

Opinion MiningUse case: Twitter comments on movies

amazing,

awesome

Weird,

odd

Weird, odd,

bad

amazing,

awesome,

perfect,

fantastic

IMDb Movie reviews*

(sample)

IMDb Movie reviews*

(Training data)

Domain-specific

opinion vocabulary

2 Clusters

Vocabulary

* http://www.cs.cornell.edu/people/pabo/movie-review-data/

applied to

Twitter comments analysis

Movie tweets

(Test data)

ailab.ijs.si

Twitter comments

analysis

• Sentiment words

distribution for a

movie

• Sentiment orientation

evolution per week,

day, hour

• Movie comparison

ailab.ijs.si

SENSOR SEARCH

Lorand Dali, Alexandra Moraru, Dunja Mladenić

ailab.ijs.si

Sensor Search - Architecture

Sensor Descriptions

(Text)Inverted Index

Ranking Model

(Personalized PageRank)

Geo Filtering

S

E

A

R

C

H

E

N

G

I

N

E

Query

• keywords

• center of area

of interest

• radius of area

of interest

ailab.ijs.si

ailab.ijs.si

REAL-TIME INFORMATION PROCESSING

Blaz Fortuna, Dunja Mladenić, Marko Grobelnik

ailab.ijs.si

Generic platform running on clouds for intensive data stream analytics…processes thousands of events per second

…includes state of the art data/text/web/stream-mining algorithms

Deployed in British Telecom, NYTimes, Bloomberg, Microsoft, TheStreet.com,

… ongoing work with Google News, Telefonica, Wikipedia,

QMiner – generic software platform for Real-Time information processing &

Complex Event Detection & Anomaly Detection

Transform&

Enrich

Anomaly

detection

Complex

events

detection

Analytics: Prediction,

Segment, Visualization

Model

CaptureReality

(Events)

Sensors,

Alarms,

User logs,

ailab.ijs.si

Network Monitoring for British Telecom

Alarms Server

Alarms

Explorer

Server

Live feed of

data

Operator Big board display

British

Telecom

Network

(~25 000

devices)

Alarms~10-100/sec

Alarms Explorer Server implements three real-

time scenarios on the alarms stream:

1. Root-Cause-Analysis – finding which device is

responsible for occasional “flood” of alarms

2. Short-Term Fault Prediction – predict which

device will fail in next 15mins

3. Long-Term Anomaly Detection – detect unusual

trends in the network

…system is used in British Telecom

ailab.ijs.si

VisualizingRoot-cause

and prediction

Root-

cause

Prediction

ailab.ijs.si

How Well Are We Predicting

Percentage Realisation of Predictions

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

Minutes

Pe

rce

nta

ge

86%

80%

60%

ailab.ijs.si

User Modeling for NYTimes & Bloomberg

Log Files

(~100M

page clicks

per day)

User

profiles

NYT

articles

Stream of

profiles

Advertisers

Segment Keywords

Stock Market Stock Market, mortgage, banking,

investors, Wall Street, turmoil, New York

Stock Exchange

Health diabetes, heart disease, disease, heart,

illness

Green

Energy

Hybrid cars, energy, power, model,

carbonated, fuel, bulbs,

Hybrid cars Hybrid cars, vehicles, model, engines,

diesel

Travel travel, wine, opening, tickets, hotel, sites,

cars, search, restaurant

… …

Segments

Trend Detection System

Stream of

clicks

Trends and

updated segments

Campaign

to sell

segments

$

Sales

ailab.ijs.si

Generalizing from registered users

BEP for Age (20% = random)

50,0%

55,0%

60,0%

65,0%

70,0%

75,0%

Conte

xt

Text F

eatu

res

Nam

ed E

ntities

All

Me

ta D

ata

All

Conte

nt

All

Fe

atu

res

Male

Female

BEP for Gender on users with at

least 10 visits (50% = random)

20,00%

25,00%

30,00%

35,00%

40,00%

45,00%

≥2

≥10

≥50

ailab.ijs.si

Good recommendations

can make a big difference

when keeping a user on a

web site

…the key is how rich context

model a system is using to

select information for a user

Bad recommendations <1%

users, good ones >5% users

click

Using User Modeling for News Recommendations

Contextual

personalized

recommendations

generated in ~20ms

ailab.ijs.si

RecommendationFeatures:

History (user profile)

Geo (based on IP)

Requested page (where we serve recommendation)

Referring URL

Time

timenow

US

Finance

Oil

All History Context Geo Requested Referring Time

Top1 Recall 66 65 65 65 66 60 60

Top2 Recall 81 78 78 75 78 67 67

Top3 Recall 86 83 83 79 81 72 72

Top Precision 52 48 49 43 41 36 36

Regular (visits > 50)

Context Geo Requested Referring Time

Top1 Recall 60 58 46 60 60

Top2 Recall 77 70 61 71 71

Top3 Recall 85 77 72 78 78

Top Precision 45 36 35 37 37

New (first visit)

training

ailab.ijs.si

Real-time Architecture

Logging

Collaborative Filter

SVM

Archive

Web

Amazon

Crawl

ailab.ijs.si

Results

0,0%

1,0%

2,0%

3,0%

4,0%

5,0%

6,0%

7,0%

17.apr 24.apr 1.maj 8.maj 15.maj

News Personalization Test Page-Story Page Transition Probabilities

Control JSI SVM Random JSI CF DailyMe Personalized Most Popular ContextualCompetitor

ailab.ijs.si

PERSONALIZED NEWS SEARCH

Lorand Dali, Blaž Fortuna

ailab.ijs.si

Personalized News Search –System Architecture

Ranking Model

Learning to Rank

Query

Search

Logs

keywords

User

−age

−country

−gender

−income

−industry

−job

ailab.ijs.si

User: Young female computer programmer

Query: Religion

ailab.ijs.si

User: Middle aged male clergy

Query: Religion

ailab.ijs.si

Videolectures.net562 events, 8169 authors, 10539 lectures,

12859 videos

ailab.ijs.si

Montreal @ Video Lectures

Recommended