View
317
Download
1
Category
Preview:
Citation preview
Elo
isa
Va
rgiu
(eva
rgiu
@b
dig
ital.o
rg) –
Ca
glia
ri, 6
Se
pte
mb
er 2
01
2
OUTLINE OF THE TALK
Introduction
Online Advertising
A Modern Contextual Advertising System
Syntactic Textual Analysis
Semantic Textual Analysis
Matching
An Example: ConCA
Experimental Results
Conclusions
References
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
INTRODUCTION
OUTERNET & INTERNET
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
OUTERNET & INTERNET
In Atkinson’s view something is missing…
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
OUTERNET & INTERNET
In Atkinson’s view something is missing…
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
OUTERNET & INTERNET
In Atkinson’s view something is missing…
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
OUTERNET & INTERNET
In Atkinson’s view something is missing…
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
OUTERNET & INTERNET
In Atkinson’s view something is missing…
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
ONLINE ADVERTISING
ONLINE ADVERTISING
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
ONLINE ADVERTISING
Sponsored Search
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
ONLINE ADVERTISING
Banner Advertising
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
ONLINE ADVERTISING
Contextual Advertising
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
CONTEXTUAL ADVERTISING
Webpage Ad
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
ONLINE ADVERTISING
Is it always a good thing?
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
ONLINE ADVERTISING
Is it always a good thing?
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
A MODERN CONTEXTUAL
ADVERTISING SYSTEM
A MODERN CONTEXTUAL ADVERTISING SYSTEM
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
Text Summarization
Bag of Words Representation
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
Text summarization
State of the art techniques
First and Last Paragraph (FLP)
Title, First and Last Paragraph (TFLP)
Snippet (S)
Title and Snippet (TS)
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
First and Last Paragraph (FLP)
You don’t need to shell out thousands,
survive various ballots, or swap a family
member for a ticket to enjoy the 2012
Summer Olympic Games this year. There's
all manner of free events and associated
shenanigans taking place in London and
across the UK to mark the occasion. Here
are ten ways to join in without spending any
money.
http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
Indulge in a family feast
Volunteer chefs at 24 Sure Start Centres
across the UK are preparing to dish up free
delights throughout the period. Details,
along with all the other events that make up
the Cultural Olympiad, are available on the
site.
SYNTACTIC TEXTUAL ANALYSIS
Title, First and Last Paragraph (TFLP)
http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
You don’t need to shell out thousands,
survive various ballots, or swap a family
member for a ticket to enjoy the 2012
Summer Olympic Games this year. There's
all manner of free events and associated
shenanigans taking place in London and
across the UK to mark the occasion. Here
are ten ways to join in without spending any
money.
Indulge in a family feast
Volunteer chefs at 24 Sure Start Centres
across the UK are preparing to dish up free
delights throughout the period. Details,
along with all the other events that make up
the Cultural Olympiad, are available on the
site.
SYNTACTIC TEXTUAL ANALYSIS
Title, First and Last Paragraph (TFLP)
http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
You don’t need to shell out thousands,
survive various ballots, or swap a family
member for a ticket to enjoy the 2012
Summer Olympic Games this year. There's
all manner of free events and associated
shenanigans taking place in London and
across the UK to mark the occasion. Here
are ten ways to join in without spending any
money.
Indulge in a family feast
Volunteer chefs at 24 Sure Start Centres
across the UK are preparing to dish up free
delights throughout the period. Details,
along with all the other events that make up
the Cultural Olympiad, are available on the
site.
London 2012 – Ten ways to celebrate the Olympics for free
SYNTACTIC TEXTUAL ANALYSIS
Snippet (S)
http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
SYNTACTIC TEXTUAL ANALYSIS
Title and Snippet (TS)
http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
SYNTACTIC TEXTUAL ANALYSIS
Bag of Words (BoW) representation
Dimensionality reduction
Stop-words removal
Stemming
Vector representation
Set of pairs <word, occurrences>
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
Stop-words removal
You don’t need to shell out thousands,
survive various ballots, or swap a
family member for a ticket to enjoy the
2012 Summer Olympic Games this
year. There's all manner of free events
and associated shenanigans taking
place in London and across the UK to
mark the occasion. Here are ten ways
to join in without spending any money.
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
Stop-words removal
You don’t need to shell out thousands,
survive various ballots, or swap a
family member for a ticket to enjoy the
2012 Summer Olympic Games this
year. There's all manner of free events
and associated shenanigans taking
place in London and across the UK to
mark the occasion. Here are ten ways
to join in without spending any money.
X X X X X X
X X X X
X X X X
X
X
X
X X X X X X
X X X X
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
Stop-words removal
You don’t need to shell out thousands,
survive various ballots, or swap a
family member for a ticket to enjoy the
2012 Summer Olympic Games this
year. There's all manner of free events
and associated shenanigans taking
place in London and across the UK to
mark the occasion. Here are ten ways
to join in without spending any money.
X X X X X X
X X X X
X X X X
X
X
X
X X X X X X
X X X X
Shell thousands, survive various
ballots, swap family member ticket
enjoy 2012 Summer Olympic Games
year. Manner free events associated
shenanigans taking place London
across UK mark occasion. ten ways
join spending money.
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
Stemming
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
Shell thousands, survive various
ballots, swap family member ticket
enjoy 2012 Summer Olympic Games
year. Manner free events associated
shenanigans taking place London
across UK mark occasion. ten ways
join spending money.
SYNTACTIC TEXTUAL ANALYSIS
Stemming
Shell thousands, survive various
ballots, swap family member ticket
enjoy 2012 Summer Olympic Games
year. Manner free events associated
shenanigans taking place London
across UK mark occasion. ten ways
join spending money.
X X X
X
X X X X
X X
X
X
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
Stemming
Shell thousands, survive various
ballots, swap family member ticket
enjoy 2012 Summer Olympic Games
year. Manner free events associated
shenanigans taking place London
across UK mark occasion. ten ways
join spending money.
X X X
X
X X X X
X X
X
X Shell thousand, surviv various ballot,
swap famil member ticket enjoy 2012
Summer Olymp Game year. Manner
free event associat shenanigan tak
place London across UK mark
occasion. ten way join spend money.
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
Vector representation
TFIDF
<free0.0116>
<olymp, 0.0235>
<event, 0.0012>
<way, 0.0125>
<london, 0.0421>
<celebrat, 0.0005>
<chef, 0.0127>
…
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
IS ENOUGH THE SOLE SYNTACTIC APPROACH?
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
IS ENOUGH THE SOLE SYNTACTIC APPROACH?
Polysemy…
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
“BASS”
IS ENOUGH THE SOLE SYNTACTIC APPROACH?
Synonymity…
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
Vehicle
Car
Automobile
Auto
Machine
SEMANTIC TEXTUAL ANALYSIS
Taxonomy-based Classification
Word Disambiguation
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC TEXTUAL ANALYSIS
Taxonomy-based Classification
Classification Features (CF) representation
Adopted classifiers
Rocchio
SVM
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC TEXTUAL ANALYSIS
Rocchio
Each centroid is defined as a sum of TF-IDF values of each
term, normalized by the number of webpages in the class
The classification is based on the
cosine of the angle between the
webpage and the centroid of each class
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC TEXTUAL ANALYSIS
SVM
The score is related to the
distance of the webpage from a
separation hyperplane
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC TEXTUAL ANALYSIS
Word Disambiguation Bag of Concepts (BoC) representation
Adopted lexical supports WordNet
YAGO
ConceptNet
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC TEXTUAL ANALYSIS
WordNet
A large lexical database
of English. Nouns, verbs,
adjectives and adverbs
are grouped into sets of
cognitive synonyms
(synsets), each
expressing a distinct
concept.
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC TEXTUAL ANALYSIS
YAGO
A semantic knowledge base, derived from Wikipedia,
WordNet and GeoNames
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC TEXTUAL ANALYSIS
ConceptNet
A network of concepts connected by several semantic
relations (e.g., “IsA”, “PartOf”)
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
MATCHING
Similarity calculation
Ranking
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
MATCHING
Similarity calculation
Adopted approaches
Cosine similarity
Jaccard index
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
MATCHING
o Cosine similarity
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
MATCHING
o Jaccard index
The Jaccard coefficient measures similarity between sample sets,
and is defined as the size of the intersection divided by the size of
the union of the sample sets
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
MATCHING
Ranking
Adopted approaches
Simple ranking according to the calculated scores
Learning to rank model
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
MATCHING
o Learning to rank model
Pointwise approach
o Each query-document pair in the training data has a numerical
or ordinal score
o Regression problem approach: given a single query-document
pair, predict its score
Pairwise approach
o Classification problem approach: learning a binary classifier
which can tell which document is better in a given pair of
documents
Listwise approach
o Optimization problem approach: try to directly optimize the
value of one of the above evaluation measures, averaged over
all queries in the training data
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
AN EXAMPLE: CONCA
CONCEPTS IN CONTEXTUAL ADVERTISING
CONCA
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
CONCA
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
RESULTS
SYNTAX VS SEMANTICS
SYNTACTICAL ANALYSIS
Text summarization techniques comparison
FLP vs TFLP vs S vs TS
Comparison metrics
Taxonomy
BankSearch Dataset
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
FPTP
TP
documentsretrieved
documentsretrieveddocumentsrelevant
|}{|
|}{}{|
FNTP
TP
documentsrelevant
documentsretrieveddocumentsrelevant
|}{|
|}{}{|
21F
SYNTACTICAL ANALYSIS
Results
Adding information about the title improves the
performances
TFLP has the best performance
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
FLP TFLP S TS
π 0.745 0.832 0.734 0.806
ρ 0.719 0.801 0.730 0.804
F1 0.732 0.816 0.732 0.805
#t 24 26 12 14
SEMANTIC ANALYSIS
Semantic approaches comparison
Anagnostopoulos et al. (2007) system vs Armano et al.
(2011-TIR) vs ConCA
Matching function
Comparison metric
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
CFBoC simsimap )1(),(
N
i
k
j
ijij
N
i
k
j
ij
FPTP
TP
k
1 1
1 1
)(
@
SEMANTIC ANALYSIS
Ad repository
Built by hand by a domain expert
Taxonomy
BankSearch Dataset
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC ANALYSIS
Results
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
k Anagnostopoulos
et al.
Armano et al. ConCA
π α π α π α
1 0.674 0 0.768 0.2 0.773 0.1
2 0.653 0 0.750 0.2 0.752 0.1
3 0.617 0.2 0.729 0.3 0.728 0.1
4 0.582 0.2 0.701 0.3 0.701 0.1
5 0.546 0.1 0.663 0.0 0.668 0.1
SEMANTIC ANALYSIS
Results
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC ANALYSIS
Results
Slight improvement by using concepts
Low values of α → CF more impact then BoC
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTICAL ANALYSIS VS SEMANTIC ANALYSIS
Contextual Advertising System
Armano et al. (2011-TIR)
Matching function
Comparisons varying α
α = 1 → pure syntax
α = 0 → pure semantics
Comparison metric
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
CFBoW simsimap )1(),(
N
i
k
j
ijij
N
i
k
j
ij
FPTP
TP
k
1 1
1 1
)(
@
SYNTACTICAL ANALYSIS VS SEMANTIC ANALYSIS
Ad repository
Built by hand by a domain expert
Taxonomy
BankSearch Dataset
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTICAL ANALYSIS VS SEMANTIC ANALYSIS
Results
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
α π@1 π@2 π@3 π@4 π@5
0 0.765 0.746 0.719 0.696 0.663
0.1 0.767 0.749 0.724 0.698 0.663
0.2 0.768 0.750 0.729 0.699 0.662
0.3 0.766 0.749 0.729 0.701 0.661
0.4 0.756 0.747 0.729 0.698 0.658
0.5 0.744 0.735 0.721 0.693 0.651
0.6 0.722 0.717 0.703 0.681 0.640
0.7 0.685 0.687 0.680 0.658 0.625
0,8 0.632 0.637 0.635 0.614 0.586
0.9 0.557 0.552 0.548 0.534 0.512
1 0.408 0.421 0.372 0.388 0.640
CONCLUSIONS
CONCLUSIONS
Online advertising
represents one of the major sources of income for a large
number of websites
is aimed at suggesting products and services to the
population of Internet users
Modern contextual advertising systems
put ads within the content of a generic, third party,
webpage
adopt both syntactical and semantic textual analyses to
select the most relevant ads for a given webpage
an example is ConCA
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
CONCLUSIONS
Results show that
the impact of semantics is stronger than that of syntax
adopting more advanced semantic techniques, such as
concepts, improves the performances
the more the suggested ads are, the worse the
performance is
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
REFERENCES
REFERENCES
Syntactical Textual Analysis Armano G., Giuliani A., & Vargiu E. Experimenting text summarization
techniques for contextual advertising. 2nd Italian Information Retrieval
Workshop (IIR’11) , 2011.
Armano G., Giuliani A. & Vargiu, E. Using snippets in text summarization: a
comparative study and an application. 3rd Italian Information Retrieval
Workshop (IIR’12), 2012.
Kolcz A., Prabakarmurthi V. & Kalita J. Summarization as feature selection for
text categorization. 10th International Conference on Information and
Knowledge Management (CIKM’01). ACM, New York, NY, USA, pp. 365–370,
2001.
Porter M. An algorithm for suffix stripping. Program 14, 3, 130–137, 1980.
Salton G., Wong A. & Yang C.S, A vector space model for automatic indexing,
Communications of the ACM, 18, 11, pp.613-620, 1975.
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
REFERENCES
Semantic Textual Analysis Cortes C. & Vapnik, V.N. Support-Vector Networks, Machine Learning, 20,
1995.
Fellbaum C. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT
Press, 1998.
Liu H. & Singh P. ConceptNet: A practical commonsense reasoning tool-kit. BT
Technology Journal 22, pp. 211–226, 2004.
Miller G.A. WordNet: A Lexical Database for English. Communications of the
ACM, 38, 11, pp. 39-41, 1995.
Rocchio J. The SMART Retrieval System: Experiments in Automatic Document
Processing. PrenticeHall, Chapter: Relevance feedback in information
retrieval, pp. 313–323, 1971.
Suchanek F.M., Kasneci G. & Weikum G. Yago - A Core of Semantic
Knowledge. 16th International World Wide Web conference (WWW 2007),
2007.
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
REFERENCES
Matching Liu T.Y. Learning to rank for information retrieval. Found. Trends Inf. Retr. 3, 3,
pp. 225–331, 2009.
Radomski P.J. & Goeman, T.J. The homogenizing of Minnesota lake fish
assemblages. Fisheries, 20, pp. 20–23, 1995.
Comparison Systems Anagnostopoulos A., Broder A. Z., Gabrilovich E., Josifovski V. & Riedel L. Just-
in-time contextual advertising. 16th ACM Conference on Information and
Knowledge Management (CIKM’07). ACM, New York, NY, USA, pp. 331–340,
2007.
Armano G., Giuliani A. & Vargiu E. Studying the impact of text summarization
on contextual advertising. 8th International Workshop on Text-based
Information Retrieval (TIR’11), 2011.
Armano G., Giuliani A. & Vargiu E. Semantic enrichment of contextual
advertising by using concepts. International Conference on Knowledge
Discovery and Information Retrieval, 2011.
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
Contact: Eloisa Vargiu – evargiu@bdigital.org
Recommended