Upload
feivel
View
23
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Applying the KISS Principle with Prior-Art Patent Search. CLEF-IP, 22 Sep 2010. Walid Magdy Gareth Jones Dublin City University. DCU participation in CLEF-IP 2009. The more text, the better the results Structured search does not help Filtering helps - PowerPoint PPT Presentation
Citation preview
Applying the KISS Principle with Applying the KISS Principle with Prior-Art Patent SearchPrior-Art Patent Search
Walid Magdy Gareth Jones
Dublin City University
CLEF-IP, 22 Sep 2010
DCU participation in CLEF-IP 2009
The more text, the better the results
Structured search does not help
Filtering helps
Combination of terms and phrases does better
Word matching for search is not the best
Blind relevance feedback is ineffective
Part of the answer is within the question
KISS
Keep It Simple and Straightforward
Three submitted simple runs:1. IR run (simple search)2. Cit run (straightforward citation extraction)3. IR+Cit run (combine IR and Cit runs)
Evaluation results (25 submitted runs):1. IR run (3rd in recall)2. Cit run (1st in precision)3. IR+Cit run (2nd in MAP, recall, and PRES)
IR run
Different document versions of a patent are merged
Only English parts are indexed (title, abstract, description, and claims)
Query is constructed from the same fields as follows:- unigrams with freq>2 from “description” field- bigrams with freq>3 from all fields
French and German topics are translated using Google translation
1st three levels of classification are used to filter results
Cit and IR+Cit runs
All patents IDs are extracted from description section in patent topics
IDs that do not exist in collection are filtered out
Remaining IDs are considered as relevant documents
Only 771 out of 2,005 topics could have citations extracted from its text (2,307 citations)
IR run is appended to Cit run after removing duplicates to create IR+Cit run
Results
Run # MAP R R@100 PRES PRES@100
IR 0.122 0.570 0.304 0.461 0.228
Cit 0.112 0.119 0.119 0.119 0.118
IR+Cit 0.2030.203 0.6180.618 0.3850.385 0.5230.523 0.3160.316
DCU runs among submitted runs (large topics set)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
PR
ES
@1
00
Conclusion & Future Work
When simpler approaches achieve better results than sophisticated ones:Much research is still needed in this area
Extracted citations can be useful for relevance feedback
Better translations can be used for FR/DE topics
Faster translation techniques can be used to translate FR/DE documents
Simply,
Thank youThank you
this was the KISSKISS principle with patent search