Upload
giovannaroda
View
2.561
Download
0
Embed Size (px)
DESCRIPTION
Patent Search: An important new test bed for IRpresented at the 9th Dutch-Belgian Information Retrieval Workshop (DIR 2009)Enschede, The Netherlandshttp://dir2009.cs.utwente.nl/
Citation preview
Patent Search: An important new test bed for IR
J. Tait, M. Lupu1
H. Berger, G. Roda, M. Dittenbach, A. Pesenhofer2
E. Graf, K. van Rijsbergen3
1Information Retrieval Facility
Vienna, Austria
2Matrixware
Vienna, Austria
3University of Glasgow
Dept. of Computing Science
Glasgow, UK
DIR 2009 / Feb. 2-3, 2009
Patent Search.
Patent search is a highly specialized form of information search.It is characterized by its
target data
type of information needs
legal and economic implications
Target data
Data for patent retrieval comes mainly from:
patent databases from patent authorities (EPO, USPTO,JPO, SIPO, WIPO, etc.)
scientific publications
prior art databases (IP.com)
A new acronym
SIPO: State Intellectual Property Office of the Peoples’ Republic ofChina
Target data
Characteristics of patent documents
multilingual and ’legalese’
non uniform formats
some are OCR’d
figures, images, chemical formulas, DNA sequences
include references to patent and non-patent literature
A new acronym
NPL: Non-Patent Literature
Information Needs.
K.H. Atkinson, Towards a more rational patent search paradigm:
depending on what group is doing the asking, the types of patentsearch requested may include simple patentability, clearance tomarket a product, validity, opposition to a patent being sought byanother, infringement watch, creating IP landscapes for businessdevelopment or R&D, infringement defense, litigation, prosecutionsupport, and creation of portfolios for assignments, investments,mergers and acquisitions [ . . . ]
Legal and economic implications.
patents are legal documents
patent portfolios are assets for enterprises
a single patent search can be worth several days of work
High recall searches
Missing even a single relevant document can have severe financialand economic impact. For example, when a granted patentbecomes invalidated because of a document omitted at applicationtime.
IntroductionPatent Search
A modern IR test bedPromoting take up of research
Conclusion
We have characterized the patent search problem by describing itstarget data, types of information needs, legal and economicimplications.
Next:
evaluating IR techniques in the patent domain
previous initiatives in the area of patent retrievalthe CLEF-IP and TREC-Chem initiatives
promoting take-up of research
Tait et al. Patent Search: An important new test bed for IR
Test collections
Test collections in Information Retrieval play a pivotal role in theevaluation of retrieval models.
Domain-specific test collections already exist for:
Web pages
news stories
legal documents
blogs
genomics
patents
Pioneering work in patent retrieval.
Patent retrieval task at the NTCIR Workshop1 since 2001.
produced test collections primarily targeting Japanese patents
retrieval tasks
ad-hoc (goal: find patents on a given topic)invalidity search (goal: find patents invalidating a given claim)patent classification according to the F-term system
Two new acronyms
F-term (abbreviation of File-forming term) is the classificationsystem used in Japan as a complement to IPC (InternationalPatent Classification)
1http://research.nii.ac.jp/ntcir
Evaluation tracks.
The IRF has engaged in two pilot evaluation tracks on patentretrieval
CLEF-IPwww.ir-facility.org/the_irf/clef-ip09-track
TREC-Chemwww.ir-facility.org/the_irf/trec_chem.htm
CLEF-Intellectual Property Initiative.
CLEF-IP
coordinated by the IRF
part of the Cross-Language Evaluation Forum2
will focus on the task of prior art search
European patents as target data
automatic extraction of relevance assessments
Prior art search
Prior art search consists in identifying all information (includingNPL) that might be relevant to a patent’s claim of novelty.
2http://www.clef-campaign.org
Prior art search.
The most common type of patent search. Performed at variousstages of the patent life-cycle and with different intentions:
before filing an application (novelty search or patentabilitysearch) to determine whether the invention fulfills therequirements of
noveltyinventive step
before grant - results go into a search report attached topatent
invalidity search: post-grant search used to unveil prior artthat invalidates a patent’s claims of originality
Target data.
The CLEF-IP evaluation track will restrict target data to patents.
Target data:
comprising 16 years (filing date between 1985 and 2000) ofEPO patents
1.9 million patent documents corresponding to 1 millionpatents
75 GB, in XML format
documents are in English, German, and French
Automatic extraction of relevance assessments.
The data resulting from prior art searches is saved in the EPO orUSPTO databases as:
citations in patent applications
citations in search report
citations in opposition’s legal files
The CLEF-IP track is going to extract this information (as muchas possible) automatically in order to form a large set of topics.
Prior art from opposition procedures.
According to the European patent law, a granted patent maybe opposed.
It is often the case that opponent provides new prior art thatinvalidates claim of originality of the invention.
Patents cited in opposition procedures are very relevant priorart documents.
They are the results of a very thorough invalidity search.
Crowdsourcing extraction of relevance assessments.
Need to extract citations from documents arising fromopposition procedures
These documents are only are available as scanned images3
Will be using crowdsourcing for extracting these citations.
A new word from business jargon
Crowdsourcing.
3at http://www.epoline.org
Relevance and evaluation measures.
Labels used in search reports:
label means that cited document is
X relevant when taken aloneY relevant in combination with other documentsA relevant but not prejudicial to novelty or inventive step
How to use these labels for defining new evaluation measures?
Challenges.
As a result of the CLEF-IP track we expect to obtain new insightson:
how to represent information need given by a patent
query reformulation
evaluation metrics for patent retrieval
using machine translation for improving retrieval effectiveness
TREC Chemistry track.
Ad-hoc search
Target data:
academic papers (Royal Society of Chemistry)chemical patent documents (class C in the IPC)
Will use automatic extraction of citations for relevanceassessments
Challenges:
chemical names and structureschemical interactions, relations, transformations, properties
IntroductionPatent Search
A modern IR test bedPromoting take up of research
Conclusion
Pioneering work at NTCIRCLEF-IPTREC-Chem
The IRF is contributing to the creation of new patent testcollections by organizing two tracks within the CLEF andTREC evaluation campaigns.
In addition to the TREC and CLEF contributions, the IRF,together with Matrixware, is promoting several initiativesaimed at facilitating and improving the patent retrievalprocess.
Tait et al. Patent Search: An important new test bed for IR
IntroductionPatent Search
A modern IR test bedPromoting take up of research
Conclusion
The IRFMatrixwarePromoting researchProviding the toolsCurrent University Projects
Promoting take up of research
Next:
presentation of the IRF and Matrixware
promoting take up of research
the IRF symposiumthe PaIR workshop
providing the tools
funding research in the area of patent retrieval
Tait et al. Patent Search: An important new test bed for IR
IRF: the Information Retrieval Facility.
New international not-for-profitfoundation, based in Vienna,
Its mission:
to bridge the gap between the needs ofthe industry and the academic know-howto promote and facilitate research inlarge scale information retrievalmaintain a facility that enables largescale information retrieval and in-depthdata processing
Matrixware.
Founded 2005 in Vienna
80 Employees
> 15 Academic Partners Worldwide
Implements solutions for access to patentinformation
Promoting research.
Matrixware and the IRF have engaged in several initiatives aimedat promoting research and raising awareness in the area of patentretrieval.
the Information Retrieval Facility Symposiuman annual symposium held in Vienna to foster knowledgeexchange between IR experts and IP professionals
the PaIR workshopa workshop on Patent Information retrieval hosted by theCIKM conference
Providing the tools.
Successful IR research conventionally depends on three elements:
1 the availability of test collections
2 access to suitable software systems on which to runexperiments
3 access to sufficiently powerful hardware
The IRF, supported by Matrixware, is providing all three of these.
Current University Projects.
Accessibility of Information (Glasgow)
Large Scale Logical Retrieval (Glasgow)
Semantic Analysis of Patent Data (Sheffield and Nijmegen)
Language Modeling for Patent Retrieval (Umass Amherst)
OCR for patents (Umass Amherst)
Concluding remarks
Patent retrieval is an interesting and important openchallenge for IR researchers.
The IRF and Matrixware have engaged in several projectsaimed at promoting research in this area.
IntroductionPatent Search
A modern IR test bedPromoting take up of research
Conclusion
Concluding remarksInvitationClosing
Invitation.
You are invited to:
join one of the evaluation tracks
CLEF-IPTREC-Chem
participate in the PaIR workshop
participate in the Information Retrieval Facility Symposium
Tait et al. Patent Search: An important new test bed for IR
Thank you for your attention.