6
Leveraging Semantic Web Technologies for More Relevant E-tourism Behavioral Retargeting Chun LU Sépage & STIH [email protected] Milan Stankovic Sépage & STIH [email protected] Philippe Laublet STIH, Université Paris-Sorbonne philippe.laublet@paris- sorbonne.fr ABSTRACT The e-tourism is today an important field of the e-commerce. One specificity of this field is that consumers spend much time comparing many options on multiple websites before purchasing. It’s easy for consumers to forget the viewed offers or websites. The Behavioral Retargeting (BR) is a widely used technique for online advertising. It leverages consumers’ actions on advertisers’ websites and displays relevant ads on publishers’ websites. In this paper, we’re interested in the relevance of the displayed ads in the e-tourism field. We present MERLOT 1, a Semantic-based travel destination recommender system that can be deployed to improve the relevance of BR in the e-tourism field. We conducted a preliminary experiment with the real data of a French travel agency. The results of 33 participants showed very promising results with regards to the baseline according to all used metrics. By this paper, we wish to provide a novel viewpoint to address the BR relevance problem, different from the dominating machine learning approaches. Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous General Terms Algorithms, Experimentation Keywords e-tourism; Behavioral Retargeting; travel destination; recommender systems; Semantic Web 1. INTRODUCTION The e-tourism is today an important field of the e-commerce. According to Google 2013 Traveler [11], more than 80% people do travel planning online. One specificity of this field is that consumers (more than 60%) spend much time comparing many options on multiple websites before purchasing because finding value is important. In average, 45 days are spent and 38 visits to travel sites are conducted before booking [5]. So when people leave a travel website, it doesn’t necessarily mean that they aren’t interested or don’t like the offers of the website. It might just mean that they want to compare with other options. In this stage of travel shopping, it’s easy for people to forget the offers or the name of the travel website. These people wouldn’t return and they are thus lost. Behavioral Retargeting (BR) is a widely used technique to address this problem. BR is a form of online targeted advertising. It leverages consumers’ actions on advertisers’ websites and displays relevant ads on publishers’ websites. We’re interested in the relevance of the displayed ads in the e- tourism field. 68% of people begin planning travel online without having a clear travel destination in mind [11]. Our research hypotheses are that the travel destinations have a big impact on the relevance of the displayed ads and by improving the relevance of the travel destinations we can improve the relevance of displayed ads. The main contribution from this paper is a Semantic-based travel destination recommender system that can be deployed to improve the relevance of BR in the e-tourism field. The remainder of the paper is organized as follows. In Section 2, we present the background of our work; in Section 3, we present the MERLOT 1 system; in Section 4, we present the conducted experiments; in Section 5, we conclude the paper. 2. Background BR systems often consist of two main components. The first component is a bidding system that decides whether to display, where to display and for how much. The second component is a recommender system that decides which ads to display. Much work has been done in the scope of the first component. In [6], the authors considered the problem of estimating user’s propensity to click on an ad or make a purchase. They predicted whether a user in a particular session is a clicker or just a browser. In [3], a semantic approach is combined with a syntactic one to improve the relevance of ads for the Contextual advertising. The authors proposed a novel way of matching advertisements to web pages that rely on a semantic topical match as a major component of the relevance score. The semantic match relies on the classification of pages and ads into a 6000 nodes commercial advertising taxonomy to determine their topical distance. As the classification relies on the full content of the page, it is more robust than individual page phrases. The evaluation demonstrated a significant effect of the semantic score component. The relevance considered in this paper is the relevance of an ad with regards to a web page. The relevance in our work is the perceived relevance of an ad itself. The second component is less discussed in BR-related papers and is more developed in papers related to recommender systems. The internal functions for recommender systems are characterized by the filtering algorithm. The most widely used classification divides the filtering algorithms into: (a) collaborative filtering, (b) demo-graphic filtering, (c) content-based filtering and (d) hybrid filtering [2]. Criteo 1 is a popular performance advertising technology company whose global reach is placed as second only to Google’s Display Network [8]. We didn’t find published papers that explain in detail their approach. By consulting their official website and several journalistic articles [1,4,10,12], we believe 1 http://www.criteo.com/ Copyright is held by the International World Wide Web Conference Committee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author's site if the Material is used in electronic media. WWW’15 Companion, May 18–22, 2015, Florence, Italy. ACM 978-1-4503-3473-0/15/05. http://dx.doi.org/10.1145/2740908.2742001 1287

Leveraging Semantic Web Technologies for More Relevant E … · 2015-05-15 · Leveraging Semantic Web Technologies for More Relevant E-tourism Behavioral Retargeting Chun LU Sépage

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Leveraging Semantic Web Technologies for More Relevant E … · 2015-05-15 · Leveraging Semantic Web Technologies for More Relevant E-tourism Behavioral Retargeting Chun LU Sépage

Leveraging Semantic Web Technologies for More Relevant E-tourism Behavioral Retargeting

Chun LU Sépage & STIH

[email protected]

Milan Stankovic Sépage & STIH

[email protected]

Philippe Laublet STIH, Université Paris-Sorbonne

[email protected]

ABSTRACT The e-tourism is today an important field of the e-commerce. One specificity of this field is that consumers spend much time comparing many options on multiple websites before purchasing. It’s easy for consumers to forget the viewed offers or websites. The Behavioral Retargeting (BR) is a widely used technique for online advertising. It leverages consumers’ actions on advertisers’ websites and displays relevant ads on publishers’ websites. In this paper, we’re interested in the relevance of the displayed ads in the e-tourism field. We present MERLOT 1, a Semantic-based travel destination recommender system that can be deployed to improve the relevance of BR in the e-tourism field. We conducted a preliminary experiment with the real data of a French travel agency. The results of 33 participants showed very promising results with regards to the baseline according to all used metrics. By this paper, we wish to provide a novel viewpoint to address the BR relevance problem, different from the dominating machine learning approaches.

Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous

General Terms Algorithms, Experimentation

Keywords e-tourism; Behavioral Retargeting; travel destination; recommender systems; Semantic Web

1. INTRODUCTION The e-tourism is today an important field of the e-commerce. According to Google 2013 Traveler [11], more than 80% people do travel planning online. One specificity of this field is that consumers (more than 60%) spend much time comparing many options on multiple websites before purchasing because finding value is important. In average, 45 days are spent and 38 visits to travel sites are conducted before booking [5].

So when people leave a travel website, it doesn’t necessarily mean that they aren’t interested or don’t like the offers of the website. It might just mean that they want to compare with other options. In this stage of travel shopping, it’s easy for people to forget the offers or the name of the travel website. These people wouldn’t return and they are thus lost. Behavioral Retargeting (BR) is a widely used technique to address this problem. BR is a form of online targeted advertising. It leverages consumers’ actions on advertisers’ websites and displays relevant ads on publishers’ websites.

We’re interested in the relevance of the displayed ads in the e-tourism field. 68% of people begin planning travel online without having a clear travel destination in mind [11]. Our research hypotheses are that the travel destinations have a big impact on the relevance of the displayed ads and by improving the relevance of the travel destinations we can improve the relevance of displayed ads.

The main contribution from this paper is a Semantic-based travel destination recommender system that can be deployed to improve the relevance of BR in the e-tourism field.

The remainder of the paper is organized as follows. In Section 2, we present the background of our work; in Section 3, we present the MERLOT 1 system; in Section 4, we present the conducted experiments; in Section 5, we conclude the paper.

2. Background BR systems often consist of two main components. The first component is a bidding system that decides whether to display, where to display and for how much. The second component is a recommender system that decides which ads to display.

Much work has been done in the scope of the first component. In [6], the authors considered the problem of estimating user’s propensity to click on an ad or make a purchase. They predicted whether a user in a particular session is a clicker or just a browser.

In [3], a semantic approach is combined with a syntactic one to improve the relevance of ads for the Contextual advertising. The authors proposed a novel way of matching advertisements to web pages that rely on a semantic topical match as a major component of the relevance score. The semantic match relies on the classification of pages and ads into a 6000 nodes commercial advertising taxonomy to determine their topical distance. As the classification relies on the full content of the page, it is more robust than individual page phrases. The evaluation demonstrated a significant effect of the semantic score component. The relevance considered in this paper is the relevance of an ad with regards to a web page. The relevance in our work is the perceived relevance of an ad itself.

The second component is less discussed in BR-related papers and is more developed in papers related to recommender systems. The internal functions for recommender systems are characterized by the filtering algorithm. The most widely used classification divides the filtering algorithms into: (a) collaborative filtering, (b) demo-graphic filtering, (c) content-based filtering and (d) hybrid filtering [2]. Criteo 1 is a popular performance advertising technology company whose global reach is placed as second only to Google’s Display Network [8]. We didn’t find published papers that explain in detail their approach. By consulting their official website and several journalistic articles [1,4,10,12], we believe

1 http://www.criteo.com/

Copyright is held by the International World Wide Web Conference Committee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author's site if the Material is used in electronic media. WWW’15 Companion, May 18–22, 2015, Florence, Italy. ACM 978-1-4503-3473-0/15/05. http://dx.doi.org/10.1145/2740908.2742001

1287

Page 2: Leveraging Semantic Web Technologies for More Relevant E … · 2015-05-15 · Leveraging Semantic Web Technologies for More Relevant E-tourism Behavioral Retargeting Chun LU Sépage

tha

[cinnardsssaisthccimd

Tokapragwrpv

F

3In

3Tage

- duthwpsr

- r

hat they use maiapproach.

7] evaluates wconsumers with nterest. The resu

narrowly construand detailed prodespond positivel

days’ ads click-search engine strategies and vashows that Behaadvertising and us more effectivehis paper, we w

consumers freqcomparing, a ricmplicitly. Based

display ads havin

The semantic apof two informatkeywords) by exa semantic graphpublishing of Linesources of diffe

are interconnectegiven on the Figwith the resourcesource is also

properties (e.g. pvalue for the sam

Figure 1. Examp

3. MERLOn this section, w

3.1 MotivatThe realization oand by the devegreat promises fengines. MERLO

Provide plausidestination choicuser’s destinationhe purchase of r

which often causproper choice thseveral, diverse elevant choice to

Leverage pubelevant to trave

inly some machi

whether indeedinformation thaults showed thatucted preferenceduct informationly to ads display-through log dto compare

alidate the effecavioral Targetingusing short terme than using lon

work in this direquently navigatch short-term pd on the construng relevant trave

pproach that we tional resources

xploiting the pathh. Semantic grapnked Data are daerent types, eached with links ofgure 1, we can ce France with o described withpopulation, latitu

me property may

ple part of a sem

OT 1 system we present the de

tions and Mof MERLOT 1 elopments in thefor the conceptiOT 1 should repl

ible alternative ce. This is especn choice, in the greasonably priceses frustration what fits their bud

enough recomo the user.

licly available, el, to augment th

ine learning coll

d firms benefitat is highly spect consumers whs have a greatern and therefore aying specific proata coming frodifferent Beha

ctiveness. The eg can do a grea

m user behaviors ng term user behction. In the e-tting, clicking,

preferences profucted profile, ouel destinations.

propose calculas (e.g., personshs between thosephs, such as thoata structures whh having a uniquf different typessee the resourcethe link of typeh literal valuesude). Resources be considered im

mantic graph

sign of the MER

Main assumpis motivated bo

e technological ion of next genly to the followin

destinations toially useful if fogiven time interved tickets and hoith users having dget. The system

mmendations in

Linked Data khe user behavio

laborative filterin

t from targetincific to their priho have developer focus on specifare more likely

oducts. [13] usesom a commerciavioral Targetinexperiment resulat help for onlinto represent use

haviors for BR.tourism case, wi

consulting anfile is constructeur system tries

ates the relevans, books, moviee two resources ose resulting frohere informationue identifier – URs. In the exampe Paris connectee “country”. Th for some of isharing the sam

mplicitly linked.

RLOT 1 system.

ptions oth by user neecontext that off

neration relevanng user needs:

a user’s curreor some reason thval does not allootel arrangementtrouble to make

m should providorder to offer

knowledge baseor data, and glea

ng

ng ior ed fic to

s 7 ial ng lts ne ers In ith nd ed to

ce es, in

om nal RI, ple ed his its

me .

ds fer

nce

ent he

ow ts, e a de

a

es, an

deeper more re

In the csourcesthe helpis nowconsidetravel dsemant

One ofdata grassumegraphs destinaprefererecommsystem

order to

3.2 TSugg

Figure

MERLOas a trauser's cpages ttravel. semantappliedThe objof usersuch aclasses that thetransfortravel sourcesthe syswhen cdefinitiwebsiteclients availab

2 http://

understanding oelevant ads with

context of the exs, under the initp of various gov

w rich with freeerable number odata, usable for tic proximity.

f the objectives raphs to generate that the infor

can help us conation recommeences and to the mendations may. It works in th

o improve its cap

The relevangestion Proc

e 2. MERLOT

OT 1 replies to avel website). Thcontext (a destinthat the user vThis input is fir

tic form. Two trd: extraction of bjective of date srs desired travel,as "weekend", "

will help adape user is interesrmation of placewebsite to sta

s. In addition totem stores the rcalculating the sion more diverse as it contains

of MERLOT ble semantic dat

/linkeddata.org/

of user actions, h higher likelihoo

xponential growtiative called Linvernmental Opeely exploitable of those sourcecalculating trave

of MERLOT 1 te useful destinarmational richnenstruct a flexiblendation, easilspecifics of diff

y take place. MEhe background

pacity to better s

nce calculaticess

1 system workf

a query submitthe query may conation he is conviewed and the rst treated by MEransformations, date semantics

semantics modu, as well as past "bank holiday",

pt the recommensted in. Place die names, as pro

andardized namo semantically preceived queries semantic proximse than the histqueries collecte1. Using histota sources, the

thus being ableod of click and c

wth of public strunking Open Dat

en Data initiativsemantic data

es concern geogel destination re

is to leverage tation recommendess of those semle and versatile ly adaptable ferent scenarios iERLOT 1 is notof an e-tourism

serve the user.

ion & Destin

flow

ted by another sontain informatiosidering to travehistory of use

ERLOT 1 to tradetailed in furthand place disa

ule is to transfortravels, to sema

, "summer holindations to the isambiguation cvided by the uses used in sem

processing the ras “history” for

mities. Such a htory data of aned from various

orical data, as wsystem then ca

e to provide conversion.

uctured data ta2 and with es, the Web sources. A

graphical or elevance and

the semantic dations. We mantic data approach to to users’

in which the t a front-end

m website in

nation

ystem (such on about the el to), travel r's previous

ansform it to her text, are

ambiguation. rm the dates antic classes iday". Such type of trip

concerns the er or by the mantic data request data, r further use history is by ny particular s websites – well as the

alculates the

1288

Page 3: Leveraging Semantic Web Technologies for More Relevant E … · 2015-05-15 · Leveraging Semantic Web Technologies for More Relevant E-tourism Behavioral Retargeting Chun LU Sépage

sth1twwthathg

Inr

3Tsbca

Aisthayca

Infscpb

3Ttasd

D(fraCwptrin

EZpthth

3Ta

3 4 5

similarity scores he user query. W

1 the geo-semantwo destinations

with regards to this measure in

alternative measuhe system ranks

geo-semantic pro

n the following ecommendation

3.3 Input The query that iseveral trips: a buying a ticket consists of two pas of two dates –

A query does nos possible to imhe case of a user

a query only withyet specify his dchoice is using anticipation of hi

n the context offound in pages systems already tcorrespondence products in orderbrowsing the pag

3.4 Place DThe place disamaxonomic entitie

set of concepts udata graphs.

Disambiguate is from the set of

from the taxonoma corresponding C. For instance, fwould return provided that Dransformation aln all calculation

Existing methodZemanta3 , Openperform this tranhe place disambhose services or

3.5 Date SeThe phase of datany significant

http://developer

http://www.ope

http://dbpedia.o

of candidate deWe call the simitic proximity as

with regards totheir distance in

n more details ures that can be s the alternative oximity scores.

subsections we n process in more

is provided as adesired trip, foand generally a

places – place of– the date of ongo

ot always have amagine a query w

r new to the sysh the history of desired destinati

MERLOT 1 tis search.

f BR, the input previously vis

track and rely onwith a partic

r to determine pges from the hist

Disambiguatmbiguation concees) referring to p

used to describe

disambiguate

a function thaf all keywords Kmy of places T uconcept definedfor a given keywthe concept

DBPedia.org is llows to use a uns and avoid amb

ds, such as conCalais4 , DBPensformation, andbiguation task istheir small adap

emantics te semantics prodate classes a

r.zemanta.com/

encalais.com/

org/spotlight/use

estinations with tlarity measure uit calculates botho their geographn the semantic gin further textused in its placedestinations wit

describe the dife details.

an input to the sor which the usa list of previouf origin and a deoing trip and of r

all those elementwithout the histotem. It is also popast trips in cas

ion, and the travto generate rec

consists of a lisited by the usn such a list of p

cular e-commerroducts that the

tory.

tion erns a mappingplaces in the usethose same plac

e : K ∪ T →C (

t provides, for K) or a given taused by a particud in a semantic word “Paris, Fran

http://dbpedia.a chosen sema

nified set of destbiguity and ident

oncept extractoedia Spotlight5 )d for this reasons feasible (eitherptations).

cessing consistsappear within t

ersmanual

the destinations used by MERLOh the proximity hical distance angraph. We prese, as well as the. In the final steth regards to the

fferent parts of th

system consists ser is considerinus trips. Each trestination, as wereturn trip.

ts. For instance,ory of past trips ossible to imaginse the user did nvel website of hommendations

st of travel offeser. Existing Bpages that is put rce catalogue user consulted b

g of keywords (er query to a finies in the semant

1)

a given keywoaxonomical enti

ular travel websitgraph of concepnce”, this functioorg/resource/Parantic graph. Thtination identifietity problems.

rs for text (e., can be used n, we assume thr by using one

s of determining the travel perio

in OT of nd ent he ep eir

he

of ng rip ell

it in ne

not his in

ers BR

in of by

(or ite tic

ord ity te, pts on ris his ers

.g. to

hat of

if od

cornereweek-ewhich iholidaytypes oof datethe recfollowifuture:

If severonly ondominaover baless fretravelin

ThanksdetermiDBPedcontainAdditiomay edetermiprove v

In the cparticulperform

3.6 DFollowProcess(desired

A trip T

Po - a U

Pd – agraph

Do – th

Dd – th

DC – th

The dawhich travel q

The mcalculaand ranthe useprojectconside

The simplace dset PD his queof usersearchithe sim

ed by a trip. Foend it is indicatiis often differentys. Users often of trips and the ms is to better un

commendations ing list of date cweekend, bank

ral classes may ne – the dominaant over weekenank holiday. Thequent date clang on the given d

s to the availaine the date cla

dia.org it is possns a weekend, onally, longer heasily be labeleination is thus very useful later

case the travel olar dates, then

med. This step is

Destination wing the Place

sing phases, thed and past) in th

T =

T consists of:

URI identifying

a URI identifyin

he date of outwa

he date of return

he dominant dat

ata consist of twoconsists of past

query, consisting

main task of theate the similaritynk them with reger either as alte, or as inspiringer undertaking a

milarity functiodestination, Pc, t

may contain onery, or may contar trips. All of ing for the most milarity scores c

or instance if a tive that it is a wt than the trips tachoose differen

motivation behinnderstand the like

to it. For the classes, that we holiday and sum

be attributed to ant one. We cond, and that su

he intuition behinass is more likdates than the m

able open data ass automaticallsible to decide ia bank holidayolidays on placeed as summer a feasible and ron.

offers that the un the data sems thus optional.

Similarity Ce Disambiguatie data is transfo

he following form

= <Po,Pd,Do,Dd

a place of origin

ng a destination

ard trip (optional

trip (optional)

e class (optional

o main elements trips in the sem

g of one trip in th

e Destination Sy scores of availagards to the likelernative destinag destinations tnew trip.

on calculates theto a set of givenne destination Pain several destithem may be appropriate alte

can be precalcul

trip is short andweekend trip, thaken in summer nt destinations fnd the discoveryely user intentiomoment we armay extend or r

mmer holiday.

one time intervonsider that bankummer holiday ind this approachkely to be the

more frequent one

sources it is ly. By looking uif a particular ti

y in the countryes with a beachholidays. The

rather simple ta

user consulted cmantics analysis

Calculationion and Date ormed in the fom:

d,DC> (2)

n in a given sema

n place in a give

)

l)

s: the semantic umantic form andhe semantic form

Similarity Calcuable destination lihood that they ations to his cuthat might motiv

e similarity of

n place destinatioPd that the user inations found in

taken into accernative destinatlated to a certa

d contains a he nature of for summer

for different y of the class on and adapt re using the refine in the

val we chose k holiday is is dominant h is that the

reason for e.

possible to up dates on ime interval y of origin. h in summer

date class ask that will

contained no cannot be

n Semantics

orm of trips

antic graph

en semantic

user profile, d the current m.

ulation is to alternatives will interest

urrent travel vate him to

a candidate ons PD. The specified in

n the history count when tion. Ideally, ain extent to

1289

Page 4: Leveraging Semantic Web Technologies for More Relevant E … · 2015-05-15 · Leveraging Semantic Web Technologies for More Relevant E-tourism Behavioral Retargeting Chun LU Sépage

accelerate the operation of the destination recommendation engine.

3.6.1 Hybrid Geo-semantic Proximity Calculation Hybrid Geo-semantic Proximity (HGSP) is the core destination measure used by MERLOT 1. This measure combines two proximity scores: the score of proximity in the semantic graph of concepts (semprox) and the score of proximity by geographical distance (dist).

simHGSP (Pd,Pc) = semprox(Pd,Pc) • dist(Pd,Pc) (4)

The geographical distance (dist) assures that the suggested destination alternatives are not in disproportion to the distance that the user is willing to travel. Suggesting a traveler from Paris interested in going to Cannes to travel to Hawaii instead would be inappropriate, despite the level of semantic similarity between Cannes and Hawaii destinations. The role of dist function is thus primarily in filtering. It augments the score of semantic proximity for places on the similar distance to the initial distance that the user was willing to pass, tolerates places found on a much shorter distance and penalizes the places found on much greater distance. The motivation for such a function is the intuition that users might more likely prefer to travel within similar or shorter distance to those initially planned.

As shown in equation 5, the dist function associates the value 2 to the places Pc found on a similar distance from the place of origin Pc as the place where users initially wanted to travel Pd. This geographical distance is expressed by the function geod. A thresholdδis used to define how great the difference in distance can be tolerated for a candidate place to be considered close enough. In our experiments we useδ=0.2, so the places found on a distance 20% shorter or greater than the initial distance (geod(Pd,Po)) that the user was prepared to pass, can be considered as close enough. For the places found in a distance much shorter than those specified by the threshold, the dist function asserts the value 1, and considers them as the second best choice. For the places found in a distance greater than the threshold, the dist function attributes an exponentially decreasing value in function of the actual distance. Such places will obtain values of dist lower then 1.

The semprox fuction quantifies the strength of connections between the initial destination place Pd and the place candidate Pc in a semantic graph. The semantic graph G is composed of a set of informational resources, a subset of which (D) designate places – destinations that are linked with typed links, where T is the set of link types. We define a rather simple graph proximity function, that only takes into account the number and the length of paths that two resources in the graph. To improve the performance of the proximity function in the future it is possible to include more advanced weighting functions. Our function, formalized by the formula 6, calculates the graph proximity of places Pd and Pc in a semantic graph G. We used DBpedia as the semantic graph, but any other graph (or their combinations) may be used in the future. According to our formula the semprox proximity of two places is calculated by taking into account all the paths in the graph that exist between these two places. For each path a score is calculated based on the length of the path and the importance of the graph pattern that the path fits into.

The importance function allows fine-tuning the approach and giving more priority to places connected over paths that are more indicative of their semantic similarity. It is this function that allows leveraging the semantic nature of links and taking into account the different types of links and paths.

Different importance functions may be used. In the actual version of the system, we used a simple importance function defined by the equation 7. In the future, it would be possible to even further refine this function to focus only on particular link types that prove to be the most significant over time.

Our importance function asserts the value 1 to a path p if p confirms to a pattern t from the predefined set of preferred patterns T. Out set T consists of the patterns represented graphically on the figure below. This particular selection is the result of observations of data structures in the DBpedia graph. The chosen graph patterns show best performance and link the most similar destinations to one another.

Figure 3. Example of graph patterns

Our patterns cover the paths established between a place Pd and Pc so that, both Pd and Pc are the targets of links of the same type (represent the values of the same property typeOfLink1) reaching from either the same concept (image on the left) or connected concepts (image on the right). In the former case, the concepts can be connected with the paths of variable types and it is the length(p) function that will allow us to factor in this length when calculating the score of proximity established over a particular path. An example of an eligible path could be established over a concept that represents an event, for instance Olympic Games, that is connected with the concepts representing Paris and London with the link type “heldIn”. While the fact that the same event took place in a particular city some time ago would rarely motivate someone to visit that city, when calculating place similarity this fact may be useful. Similar events are often organized in similar places, and we indeed observed in our data that this pattern is an interesting one to follow.

In practice, we calculate the semprox measure by running SPARQL queries on DBpedia to retrieve the candidate places Pc that are findable in the graph proximity of the initial destination Pd, on paths that confirm to our path patterns. Once the place candidates have been collected and the paths lengths calculated it is easy to calculate the semprox value.

The calculation of the geographical distance is also possible by relying on DBpedia data about the geographic coordinates of places Pd and Pc. In practice, in order to co-accelerate the calculation process, we perform a single SPARQL query containing both the conditions for semprox and dist functions.

1290

Page 5: Leveraging Semantic Web Technologies for More Relevant E … · 2015-05-15 · Leveraging Semantic Web Technologies for More Relevant E-tourism Behavioral Retargeting Chun LU Sépage

4Prcruqtha

4TAaCoaMabuc

4Wto

T

1

2

3

4

5

T

T

1

2

3

4

5

4Wna

TR 6

4. PreliminPerforming a thequire us to im

calculate the clesearch we are n

user study wherquestion a panelhe likelihood th

and by the baseli

4.1 BaselineThe baseline thaAmong its clientalso uses MERLCriteo and MERLon this website, aads of this adverMERLOT 1 and allows us to combelieve is a maused by the basecommerce offer c

4.2 ExperimWe established ao perform the fo

The main steps th

1. Participantsoptions for occasion for

2. They went tasked to conleast 3 page

3. They were baseline syevaluated eRelevance S

4. The evaluatwhere on thMERLOT 1evaluated ea

5. They gave a

The system rando

The Five-Level R

1. This offer d

2. This offer m

3. This offer se

4. This offer is

5. This offer is

4.3 MetricsWe used four mnumber of offersand F1 score.

Total number oRelevance Scale

http://www.sofo

nary Evaluathoroughly compmplement a biick-through ratenot yet able to dre we isolate thl of Internet usehat they click onine.

e at we compare wts, Criteo is usedOT 1 to recommLOT 1 are instalaveraging 60k unrtiser are display

Criteo system umpare our proposachine learning eline on the samcatalogue.

ment proceda questionnaire

ollowing steps:

hat participants f

put themselvestheir next travelr travel (for insta

to the mentionednsider at least 3 s with travel off

redirected to tstems showed i

each of the 3 oScale.

tion form led thhe same placem1 were placed inach of the 3 offe

an overall impres

omly changed th

Relevance Scale

doesn’t interest m

might be interesti

eems interesting

s interesting and

s very interesting

s metrics to gauge

s corresponding

of offers corres, from level 1 to

foot.com/

tion plete evaluationidding system e, which at thi

do. We thus perfhe behavior of rs (mixed and a

n the ads propos

with is the popud by a French tr

mend travel on thlled to track usernique visitors a myed on the websuse two differensed semantic appcollaborative fi

me input data an

dure online and aske

followed are:

s into the scenarl. They had to imance next holida

d French travel woffers available

fers)

the So Foot weits ads on a rigoffers according

hem to a page siment on the rignstead of the bars according to t

ssion and a free

he order of steps

that we used is:

me at all.

ing but I don’t cl

g but I hesitate to

I click to know

g and I click to k

the relevance peg to each level,

sponding to eao 5, the relevanc

n would actualof our own anis early stage

form a preliminaf our interest anaged 23-35) abosed by our syste

ular Criteo enginravel website, thheir website. Bor browsing histomonth. The Critesite of “So Foot6

nt approaches, thproach to what wfiltering approacnd on the same

ed the participan

rio of looking fmagine a concre

ays).

website, and wee there (i.e. visit

ebsite, where thght sidebar. Theg to a Five-Lev

imilar to So Fooght, 3 offers froaseline ones. Thethe same scale.

comment.

3 and 4.

lick.

o click.

more.

know more.

erformance: TotPrecision, Reca

ach level: In thce is in ascendin

lly nd of

ary nd

out em

ne. hat oth ory eo 6”. his we ch, e-

nts

for ete

ere at

he ey

vel

ot, om ey

tal all

he ng

order. T5, the b

Precisinumberand 2 aand therelevanintentio

Recall evaluatapprecithe numrelevan

F1 scor

4.4 R33 partthe met

Figure

Figure

Figure

010203040

00.51

00.51

The more there better relevance p

ion is the numbr of offers. We cas not relevant, oey count 0.5, o

nt and they couon to click.

is modified to ted only the iation about the mber of relevan

nt offers of the tw re is the harmon

Results & Dticipants answeretrics mentioned

e 4. Total numb

e 5. Results of pr

e 6. Results of re

Level1 Level2

0 5 10 1

0 5 10 1

are offers correperformance the

ber of relevant consider that offoffers corresponoffers corresponunt 1 as they

adapt to our sdisplayed ads

non-displayed nt offers dividewo systems. nic mean of preci2 ∗ Discussion

ed our questionnabove.

er of offers corr

recision metric

ecall metric

Level3 Level4 L

15 20 25

15 20 25

sponding to levee system has.

offers divided bfers correspondinding to level 3 h

nding to level 4correspond to

situation. Since . We don’t kads. The modif

ed by the total

2ision and recall.∗

naire. Here are th

responding to e

Level5MEBa

30MEBa

30MEBa

els closer to

by the total ng to level 1 half-relevant 4 and 5 are

an explicit

participants know their

fied recall is number of

he results of

each level

ERLOT1seline

ERLOT1seline

ERLOT1seline

1291

Page 6: Leveraging Semantic Web Technologies for More Relevant E … · 2015-05-15 · Leveraging Semantic Web Technologies for More Relevant E-tourism Behavioral Retargeting Chun LU Sépage

F

InoaFsabo(vomoothswtopthpOeurgpomthcbthdsp

5Indthpoticoprne

Figure 7. Result

n Figure 4, we cof total number and 5) – it bringFigure 5, 6, 7 arscore. Again, Maverage F1 scorbaseline had 0.2over 3 offers shvertically), it is

offer. When loomore relevant wion the second ofof our system (0he fact that the

seen by the userwhile our systemo our main m

probability (for bhe ratings cor

probability was Obviously these estimate for a prusers to be moreegular browsing

generosity was performed the sorder) so it is remay indicate a phrough rates if

comment part, sby MERLOT 1 ahe baseline are

diversity. The dishow, may explaperformance.

5. Conclusin this paper, we

destination recomhe relevance of

people participaoutperforms the ime and sampl

convergence of rof Semantic Wepromising approetargeting system

nature of a Semexperiments in a

00.510 5

ts of F1 score m

can see a clear adof relevant offegs higher chancre respectively rERLOT 1 outpee of MERLOT

27. Since these howed to each ualso interesting king at the firstith 0.359 (vs. 0.3ffer where most .5 vs. 0.234). Thbaseline system

r (likely to be hm doesn’t show th

etrics, we wereboth systems) torresponding to 0.271 for our syvalues are too hrobability of cli

e generous in clig behavior, but equally high fostudy, and systeasonable to belpotential of MERf used in a realome participant

are diverse and stoo similar to thiversity and cur

ain a part of the d

ions e presented MERmmender systemBehavioral Reta

ated in the evbaseline accordile size represenresults on multipeb data to augmoach to improvems in the future

mantic Web appa more quantita

10 15 20metric

dvantage for MEers (which correces to generate results of precisierforms the bas1 offers was 0

results correspouser at a time to observe the dt offer, MERLO328 for baseline)of the difference

his difference cam often proposesighly relevant) he offers alreadye interested to o generate a rati

the intention ystem vs. 0.177high to be consiick, as it is normicks in a declarawe can safely

or both systemtems were preslieve that the obRLOT 1 to genel browsing scents found that thesurprising, that thhe clicked offersriosity that the Mdifferences creat

RLOT 1, a semm that can be depargeting in the evaluation. MERing to all used mnt limitations tple metrics, indiment behaviorae the performane. After confirmproach, we wil

ative setting foc

25 30ERLOT 1 in termespond to Levelclick-worthy adion, recall and Feline system. Th

0.431 whereas thond to an averagin one ad bann

difference offer bOT 1 was slight), but it is actuale is made in fav

an be explained bs an offer alreadon the first placy seen. In additio

see what is thing 4 or 5 (one

to click). Suc7 for the baselinidered an accuramal to expect th

ative study then assume that thes (as same use

sented in randobserved differenerate higher clicnario. In the fre offers generatehose generated bs and they lack MERLOT 1’s ated in the system

mantic-based travployed to improve-tourism field. 3RLOT 1 systemetrics. While tho our study, thicates that the u

al data may be nce of behavior

ming the promisinll conduct furthcusing only on a

MERLOT1Baseline

ms 4

ds. F1 he he ge

ner by tly lly

vor by dy ce, on he of ch

ne. ate he in

eir ers om ce

ck-ree ed by of ds

ms’

vel ve 33 em he he

use a

ral ng

her ad

click dathe paphad to b

6. RE[1] “A

fromo

[2] Bo(2Sy

[3] BrJuPrcoret

[4] “C(2<hma

[5] “CPu<h

[6] DjN.DiIn

[7] Lwoof

[8] “LViReevrevwo

[9] Litarintma

[10] “Thtt

[11] “T<htra

[12] “Tbo<hav

[13] Ya(2onco

ata as opposed tper – which has be performed on

EFERENCAu cœur du Moteom <http://wwwoteur-criteo/>

obadilla, J., Orte2013). Recommeystems, 46, 109-

roder, A., Fontouuly). A semantic roceedings of theonference on Restrieval (pp. 559-

Criteo : les Big D2015, January). Rhttp://pro.clubic.arketing-paris-cr

Custom Researcurchase”. (2014, http://info.advert

juric, N., Radosa. (2014). Hiddenistributed User E

nternational Conf

Lambrecht, A., &ork? informationf Marketing Rese

Latest comScore irtually One Billetrieved from <hvents/press-releaveals-that-criteoorldwide/>

iu, K., & Tang, Lrgeting with a soternational confeanagement (pp.

The Criteo engintp://www.criteo

The 2013 Travelhttp://www.thinkaveler.html>

Tout ce qu’il fauourse le 30 octobhttp://frenchwebvant-son-entree-e

an, J., Liu, N., W2009, April). Hownline advertisingonference on Wo

to explicit user inthe merit of pro

n smaller sample

ES eur Criteo”. (20

w.viuz.com/2014

ega, F., Hernandender systems su-132.

ura, M., Josifovsapproach to con

e 30th annual intsearch and devel-566). ACM.

Data au cœur du Retrieve from .com/webmarketriteo.html>

ch: Exploring the august 28). Rettising.expedia.co

avljevic, V., Grbn Conditional RaEmbeddings for ference on Data

& Tucker, C. (20n specificity in oearch, 50(5), 561

Report Revealslion Users Worldhttp://www.criteases/2014/10/lateo-ads-reach-virtu

L. (2011, Octobeocial twist. In Prference on Inform1815-1824). AC

ne”. (2015, Janua.com/what-we-d

ler”. (2014, Augkwithgoogle.com

ut savoir sur Critebre”. (2015, Janub.fr/tout-ce-quil-fen-bourse/11450

Wang, G., Zhangw much can beh

g?. In Proceedingorld wide web (p

nterrogation thaoviding detailed e volumes.

15, January). Re4/04/07/au-coeur

do, A., & Gutiérrurvey. Knowledg

ski, V., & Riedentextual advertisternational ACMlopment in infor

reciblage public

ting/actualite-53

e Traveler's Pathtrieved from om/path-to-purch

bovic, M., & Bhaandom Fields wiAd Targeting. InMining (ICDM)

13). When does online advertisin1-576.

s that Criteo Adsdwide”. (2015, Jo.com/news-and

est-comscore-repually-one-billion

er). Large-scale roceedings of themation and knowCM.

ary). Retrieved fdo/technology/

gust 28). retrievem/research-studi

eo… avant son euary). Retrieve ffaut-savoir-sur-c00>

g, W., Jiang, Y., havioral targetinggs of the 18th intpp. 261-270). AC

t we used in insight, but

etrieved r-du-

rez, A. ge-Based

el, L. (2007, ing. In

M SIGIR rmation

citaire”.

38096-

h to

has>

amidipati, ith n IEEE ).

retargeting ng. Journal

s Reach January). d-port-n-users-

behavioral e 20th ACM wledge

from

ed from es/2013-

entrée en from criteo-

& Chen, Z. g help ternational CM.

1292