Upload
kerry-james
View
218
Download
0
Embed Size (px)
Citation preview
Automatic indexing and retrieval of crime-scene photographs
Katerina Pastra, Horacio Saggion, Yorick Wilks
NLP group, University of Sheffield
Scene of Crime Information System (SOCIS)
Cambridge 2002
Outline
Application Scenario Project Overview SOCIS features Text-based approaches Using NLP: The Indexing mechanism The Retrieval mechanism Preliminary system evaluation Links
Cambridge 2002
Crime Scene Documentation:Current Practices
Scene of Crime Officers: attend crime scene photograph the scene collect evidence (package and label items) write reports and create indexed photo-album(s) case-files piled in storage rooms
Cambridge 2002
Examples
Ref: 6007898 Scenes of Crime Department Photographic Index Subject : SUD - Mary Smyth
Date : 24 - 1 - 00
Photographer : Jim Davis
1 Shows Ford Escort motor car A97BAK in woods off Winney Hill,
Harthill
2 - 3 Show close - up views of same vehicle
4 Shows the exhaust pipe
5 Shows the offside
6 Shows the front interior
7 Shows the front passenger seat (bag)
8 Shows a knife on the rear nearside seat
9 - 10 Show the position of the deceased rear offside
11 Shows the face of deceased at Rotherham District
Cambridge 2002
IT support for CSI
Crime Investigation requires: Fast and accurate retrieval of case-related info
(and therefore efficient classification of this info) Identification of “patterns” among cases
IT support for Crime Investigation: Governmental agencies’ Systems (HOLMES) Commercial Systems (LOCARD, SOCRATES)
(Crime Management and Administration Systems)
Needed: “Intelligent” support for Crime Investigation
Cambridge 2002
Project Overview
Domain: Scene of Crime Investigation (SOC) Scenario: Use of digital photography and speech to populate a central police database with case related information Objective: Creation of a prototype system that allows for intelligent indexing and retrieval of crime photographs
2000 - 2003
Cambridge 2002
SOCIS features
Access through the web (JSP application)
Storage of case documentation &
meta-information in central database
Automatic indexing of photographs
Automatic retrieval of photographs
Automatic population of official forms
Cambridge 2002
Text-based image indexing & retrieval: approaches
• Manual assignment of keywords • Automatic extraction of keywords (statistics +/
semantic expansion) [Smeaton’96, Sable’99, Rose’00]• Extraction of logical form representations
(syntactic relations and concept classification) [Rowe’99]
Precision and recall increase as indexing termsgo beyond keywords capturing relational info
Cambridge 2002
Text-based image indexing & retrieval: problems
“view to the loft” vs. “view into loft” “position of baby with no bedding” “position of baby with bedding removed”
keyword barriersyntactic relations need to be
complemented with semantic information
Consider:
Cambridge 2002
Pipeline of processing resources:
tokeniser sentence splitter POS tagger lemmatizer NE recognizer parser discourse interpreter (+ triple extraction layer)
Indexing-Retrieval Mechanism
Free text queryOntoCrime
+ KB
Indexingterms
ARG1 REL ARG2
Query triples
ARG1 REL ARG2
matching
captions
Cambridge 2002
Corpus and Domain Model
1200 captions from 350 different crime cases dealt by South Yorkshire Police (text files) 65 captions (transcribed speech experiment)
Different lengths but same characteristics: Phrasal constructions, named entities, meta-info, what
and where references
Domain model = OntoCrime and knowledge base
Role = selection restrictions for triples’ arguments
and semantic expansion for retrieval
Cambridge 2002
Triple Extraction
17 Relations : AND, AROUND, MADE-OF,
OF, ON, WITHOUT etc.
Form of triples: ARG1 REL ARG2
Restrictions and filters for arguments
Rules for captions with multiple relations
Inferences restricted to certain cases
Cambridge 2002
Triple Extraction examples
“body on floor surrounded by blood”
“shot of footprint on top of bar”
“photograph from behind bar of body on floor”
“bottle, gun and ashtray on table”
“footprint with zigzag and target on chair”
blood AROUND floorblood AROUND body
Body ON floor
Cambridge 2002
Retrieval Mechanism
Allow for free text query Extract relational facts from the query Match the query triples with the indexing triples
of each captioned photograph Allow for exact match of arguments or class info ARG1, RELATION, ARG2Class: Class:
If no triples can be extracted, keyword matching
takes place with semantic expansion if needed
Cambridge 2002
Preliminary Evaluation
Indexing mechanism evaluation run on 600
captions indicated refinements on the rules
(80% accuracy in extracting and inferring triples)
Preliminary usability evaluation with real users:
Relational information considered to be an intuitive way for forming queries for image retrieval
Future work: overall evaluation of free text query for image retrieval
Cambridge 2002
Conclusions
Could the SOCIS approach be ported to other
domains ?Thorough testing and experimentation needed However, it is a corpus-driven approach:
Not just an alternative image indexing/retrieval
approach,but the one dictated by a real application
For more information on SOCIS:
http://www.dcs.shef.ac.uk/nlp/socis