16
Advanced information retrieval Chapter. 02: Modeling (Set Theoretic Models)   Fuzzy model

Chap02cFuzzy

Embed Size (px)

Citation preview

Page 1: Chap02cFuzzy

7/31/2019 Chap02cFuzzy

http://slidepdf.com/reader/full/chap02cfuzzy 1/16

Advanced informationretrieval

Chapter. 02: Modeling (SetTheoretic Models)  – 

Fuzzy model

Page 2: Chap02cFuzzy

7/31/2019 Chap02cFuzzy

http://slidepdf.com/reader/full/chap02cfuzzy 2/16

Set Theoretic Models

•The Boolean model imposes a binarycriterion for deciding relevance

• The question of how to extend theBoolean model to accomodate partialmatching and a ranking has attractedconsiderable attention in the past

• We discuss now two set theoretic modelsfor this:

 – Fuzzy Set Model

 – Extended Boolean Model

Page 3: Chap02cFuzzy

7/31/2019 Chap02cFuzzy

http://slidepdf.com/reader/full/chap02cfuzzy 3/16

Fuzzy Set Model

Queries and docs represented by sets of index

terms: matching is approximate from the start This vagueness can be modeled using a fuzzy

framework, as follows:

with each term is associated a fuzzy set

each doc has a degree of membership in this fuzzy

set

This interpretation provides the foundation for 

many models for IR based on fuzzy theory In here, we discuss the model proposed by

Ogawa, Morita, and Kobayashi (1991)

Page 4: Chap02cFuzzy

7/31/2019 Chap02cFuzzy

http://slidepdf.com/reader/full/chap02cfuzzy 4/16

Fuzzy Set Theory

Framework for representing classes whoseboundaries are not well defined

Key idea is to introduce the notion of a degree of 

membership associated with the elements of a set

This degree of membership varies from 0 to 1 and

allows modeling the notion of marginal membership

Thus, membership is now a gradual notion, contrary

to the crispy notion enforced by classic Boolean logic

Page 5: Chap02cFuzzy

7/31/2019 Chap02cFuzzy

http://slidepdf.com/reader/full/chap02cfuzzy 5/16

 Alternative Set Theoretic Models

-Fuzzy Set Model• Model

 – a query term: a fuzzy set

 –a document: degree of membership in this set

 – membership function• Associate membership function with the elements

of the class

•0: no membership in the set

• 1: full membership

• 0~1: marginal elements of the setdocuments

Page 6: Chap02cFuzzy

7/31/2019 Chap02cFuzzy

http://slidepdf.com/reader/full/chap02cfuzzy 6/16

Fuzzy Set Theory

• A fuzzy subset A of a universe of discourseU is characterized by a membership

function µ A : U[0,1] which associateswith each element u of U a number µ A (u)in the interval [0,1]

 –complement:

 – union:

 – intersection:

)(1)( uu  A A   

))(),(max()( uuu  B A B A    

))(),(min()( uuu  B A B A    

a class

a document

document collectionfor query term

Page 7: Chap02cFuzzy

7/31/2019 Chap02cFuzzy

http://slidepdf.com/reader/full/chap02cfuzzy 7/16

Examples•  Assume U={d1, d2, d3, d4, d5, d6}

• Let A and B be {d1, d2, d3} and {d2, d3, d4},respectively.

•  Assume  A ={d1:0.8, d2:0.7, d3:0.6, d4:0, d5:0, d6:0}and

B={d1:0, d2:0.6, d3:0.8, d4:0.9, d5:0, d6:0}

•   ={d1:0.2, d2:0.3, d3:0.4, d4:1, d5:1, d6:1}

• =

{d1:0.8, d2:0.7, d3:0.8, d4:0.9, d5:0, d6:0}• =

{d1:0, d2:0.6, d3:0.6, d4:0, d5:0, d6:0}

)(1)( uu  A A   

))(),(max()( uuu  B A B A    

))(),(min()( uuu  B A B A    

Page 8: Chap02cFuzzy

7/31/2019 Chap02cFuzzy

http://slidepdf.com/reader/full/chap02cfuzzy 8/16

Fuzzy Information Retrieval

• basic idea

 – Expand the set of index terms in the query

with related terms (from the thesaurus) suchthat additional relevant documents can beretrieved

 – A thesaurus can be constructed by defining aterm-term correlation matrix c whose rowsand columns are associated to the indexterms in the document collection

keyword connection matrix 

Page 9: Chap02cFuzzy

7/31/2019 Chap02cFuzzy

http://slidepdf.com/reader/full/chap02cfuzzy 9/16

Fuzzy Information Retrieval(Continued)

• normalized correlation factor ci,l betweentwo terms k i and k l (0~1) 

• In the fuzzy set associated to each index

term k i, a document d j has a degree of membership µi,j

lili

lili

nnn

nc

,

,,

)1(1 ,,

 jd lk 

li ji c 

whereni is # of documents containing term k inl is # of documents containing term k lni,l is # of documents containing k i and k l

Page 10: Chap02cFuzzy

7/31/2019 Chap02cFuzzy

http://slidepdf.com/reader/full/chap02cfuzzy 10/16

Fuzzy Information Retrieval(Continued)

• physical meaning

 – A document d j belongs to the fuzzy set associated tothe term k 

i

if its own terms are related to k i

, i.e.,i,j=1.

 – If there is at least one index term k l of d j which isstrongly related to the index k i, then i,j1.

k i is a good fuzzy index

 – When all index terms of d j are only loosely related tok i, i,j0.

k i is not a good fuzzy index

Page 11: Chap02cFuzzy

7/31/2019 Chap02cFuzzy

http://slidepdf.com/reader/full/chap02cfuzzy 11/16

Example

• q=(k a  (k b  k c))=(k a  k b k c) (k a  k b   k c) (k a  k b  k c)=cc1+cc2+cc3

Da

Db

Dc

cc3cc2

cc1

Da: the fuzzy set of documentsassociated to the index k a

d jDa has a degree of membership

a,j > a predefined threshold K 

Da: the fuzzy set of documents

associated to the index k a(the negation of index term k a) 

Page 12: Chap02cFuzzy

7/31/2019 Chap02cFuzzy

http://slidepdf.com/reader/full/chap02cfuzzy 12/16

Example

))1)(1(1())1(1()1(1

)1(1

,,,,,,,,,

3

1,

,321,

 jc jb ja jc jb ja jc jb ja

i jicc

 jcccccc jq

         

 

  

Query q=k a  (k b   k c)

disjunctive normal form qdnf =(1,1,1) (1,1,0) (1,0,0)

(1) the degree of membership in a disjunctive fuzzy set is computed

using an algebraic sum (instead of max function) more smoothly 

(2) the degree of membership in a conjunctive fuzzy set is computed

using an algebraic product (instead of min function)

Recall  )(1)( uu  A A   

Page 13: Chap02cFuzzy

7/31/2019 Chap02cFuzzy

http://slidepdf.com/reader/full/chap02cfuzzy 13/16

Fuzzy Set Model

 –Q:  “gold silver truck ”  D1:  “Shipment of gold damaged in a fire”  D2:  “Delivery of silver arrived in a silver truck ”  D3:  “Shipment of gold arrived in a truck ”  

 –IDF (Select Keywords)

• a = in = of = 0 = log 3/3 arrived = gold = shipment = truck = 0.176 = log 3/2 damaged = delivery = fire = silver = 0.477 = log 3/1 

 –8 Keywords (Dimensions) are selected

• arrived(1), damaged(2), delivery(3), fire(4), gold(5),silver(6), shipment(7), truck(8)

Page 14: Chap02cFuzzy

7/31/2019 Chap02cFuzzy

http://slidepdf.com/reader/full/chap02cfuzzy 14/16

Fuzzy Set Model

Page 15: Chap02cFuzzy

7/31/2019 Chap02cFuzzy

http://slidepdf.com/reader/full/chap02cfuzzy 15/16

Fuzzy Set Model

Page 16: Chap02cFuzzy

7/31/2019 Chap02cFuzzy

http://slidepdf.com/reader/full/chap02cfuzzy 16/16

Fuzzy Set Model• Sim(q,d): Alternative 1

Sim(q,d3) > Sim(q,d2) > Sim(q,d1)

• Sim(q,d): Alternative 2

Sim(q,d3) > Sim(q,d2) > Sim(q,d1)