View
267
Download
0
Tags:
Embed Size (px)
DESCRIPTION
It is quite often observed that when people use retrieval systems, they do not just search documents or text passages in the first place, but for some information contained inside, which is related to some entities, for instance, person, organization, location, events, time, etc. The goal is to find out various kinds of valuable semantic information about real-world entites embedded in different web pages and databases. But It is a difficult task for us to find out specific or exact information about entities from present search engines. So we need search engines, which will identify our queries across different domains and extract structured information about entities.
Citation preview
ENTITY SEARCH ENGINE : A NEW SEARCH TOOL
Speaker : Tanmay Mondal , MSLIS 2013-2015
Indian Statistical Institute , Bangalore Documentation Research and Training Centre Seminar ( 1 ) - 2014
OverviewOverview
Present ApproachPresent Approach
Entity SearchEntity Search
Benefit of Entity SearchBenefit of Entity Search
Entity & Its FacetsEntity & Its Facets
Main Work of ESEMain Work of ESE
Popular Entity SearchPopular Entity Search
OKKAM-OKKAM-Enabling a Web of EntitiesEnabling a Web of Entities
Workflow of OkkamWorkflow of Okkam
My LibraryMy Library
ReferencesReferences
Present Approach
● Information is everywhere & it is growing exponentially
● A traditional information extraction approach is to scan every
document in any collection
● As document collection is the set of all web pages indexed by a
search engines
● Time consuming for users for getting pin-pointed information
Person
Location Organization Nationality Religion Product
Phone Number
Email Address/URL
Distance
Date
Time
Money Generic Number
For specific Information
Problem of identifying and linking / grouping different manifestations of the same real world object
Web of Documents Web of Entites
Cluster the records that correspond to same entity
Entity Search
● Entity refers to any object or a thing that can be uniquely identified in
the world
● It's a better match search queries with a database containing hundreds of
millions of "entities"● Each entity is in relation with many entites
● The answer entities have specific information & identifying the right
relationship among the entities● Semantic or faceted search on entities
Why ?
● When people use retrieval systems they are often not searching for
documents or text passages● Summarization of entities and concepts
● The named entities (persons, organizations, locations, products...) play a
central role in answering such information needs
● At least 20-30% of the queries submitted to Web SE are simply entities
● ~71% of Web search queries contain named entities
**Source - Building Taxonomy of Web Search Intents for Name Entity
Queries by Xiaoxin Yin & Sarthak Shah
Benefit of Entity Search
● Entities are often categorized into a taxonomy
● Primary task of the user is often to make a decision
● More structured than document based
● Entity is associated with the same URI across the different repositories
● Entity Information Integration● More understandable by Human
● Increase precision & less Time Consuming
Entity & Its Facets
● An entity must be distinguished from other entities Can be anything
including an abstract thing like Diseases ,Imaginary art etc.
● Type of an entity refers to a generic class into which the given entity is
classified.
● Attribute refers to a property (predicate) associated with an entity.
● Value refers to the value of an attribute (for a given entity).
● Relation provides more information with many entites
● Entity, Prof. S.R. Ranganathan is a person , IBM is an organization
Main Work of ESE
● Entity Retrieval : Entity search engines can return aranked list of entities most relevant for a user query
● Entity Relationship / Fact Mining and Navigation : It discover interesting relationships / facts about the entities associated with their queries
● Prominence Ranking : Detect the popularity of an entity and enable users to browse entities in different categories
● Entity Description Retrieval : Entity description blocks for each entity information about an object in a web page is generally grouped together as an object block
Popular Entity Search
● Product search-Various Products like Books, Electronics, Clothes, etc.
● People search-Experts, Friends, Profile of famous persons, etc.
● Location search-Travel, Address ,Business, Govt Offices, etc.
Idea about entity search engine
Main Work of ESE
● Entity Retrieval : Entity search engines can return aranked list of entities most relevant for a user query
● Entity Relationship / Fact Mining and Navigation : It discover interesting relationships / facts about the entities associated with their queries
● Prominence Ranking : Detect the popularity of an entity and enable users to browse entities in different categories
● Entity Description Retrieval : Entity description blocks for each entity information about an object in a web page is generally grouped together as an object block
Various ESE
● Freebase-http://www.freebase.com/● Sindice-http://sindice.com/● Geneview-http://bc3.informatik.hu-berlin.de/● Okkam-http://www.okkam.org/● WolframAlpha-http://www.wolframalpha.com/● Yatedo-http://www.yatedo.com/● GeoNames-http://www.geonames.org/● Dbpedia-http://dbpedia.org/About● EntityCube-http://entitycube.research.microsoft.com/ etc......
OKKAM-Enabling a Web of Entities
● Any collection of data and information about any type of entities
published on the Web can be integrated into a single virtual,
decentralized, open knowledge base.
● It leads to a faster, more efficient and more precise way to
deal with the flood of information available on the Web today
Entities should not be multiplied beyond necessity
OKKAM ENS
● OKKAM ENS is for entity search, where storage, indexing and matching technology was built for finding an entity given its description
● Every entity (individual, instance, “thing”) is assigned a global identifier, ideally unique
● More than 7.5 million entity repository with more structured formEntity identifiers should not be multiplied beyond necessity
Project Partners
● University of Trento, Italy (Co-Ordinator) ● L3S Research Center, Germany● SAP Research, Germany● Expert System, Italy● Elsevier B.V., Netherlands● Europe Unlimited SA, Belgium● National Microelectronics Application Center (MAC), Ireland● Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland● DERI Galway, Ireland● University of Malaga, Spain● INMARK, Spain● Agenzia Nazionale Stampa Associata (ANSA), Italy
Sources Of Information
● Wikipedia Provides lists of countries, cities, members of particulars
domains which is very common for our search query
● GeoNames contains over 10 million geographical names and consists of
over 9 million unique features of 2.8 million populated places and 5.5
million alternate names
● OkkamDBManager Another important information source for OKKAM
can be generic databases like extranets, online shops or publishing
houses
● OkkamManualEntry Another solution we provide to insert new entities
is the manual case
Data extracted from any unstructed sources more effectively
Cogito Semantic Technology
● Semantic analysis engine and complete semantic
network for a complete understanding of text
● Transforming unstructured information into structured
data
● Identifies the most relevant concepts
● Interprets the meaning of texts
● Precisely extracts information
● Automatically connet entites extracted from sources
SensigrafoSensigrafo● Enables the disambiguation of terms
● It allows Cogito to understand the meaning of words and
context
● Extraction of data and metadata
● Product development, competitive intelligence,marketing
,Finance, Media & Publishing, Oil & Gas, Life Sciences &
Pharma, Government and Telecommunications and many
activities where knowledge sharing is critical
● More than 1 million concepts,more than 4 million
relationships
Workflow of OkkamWorkflow of Okkam
● Storage: A scalable repository of entity profiles, in which billions of entities are assigned an ID and a profile, to distinguish one entity from another
● Matching: Requests from client applications arrive in the form of a bag of keywords or a collection of name value pairs (unstructured or semi-structured queries
● ID storage and management: stores, maintains and makes available for reuse IDs (URIs) for anything which is named in a networked environment
● Lifecycle Management: It takes care of the evolution Storage of the repository and of all entity profiles through different time
Entity Query & Matching in Okkam
ISI
Wolfram|Alpha
● Wolfram|Alpha is an engine for computing answers and
providing knowledge
● It generates output by doing computations from its own
internal knowledge base, instead of searching the
web and returning links
● It is an online service that answers factual queries
directly by computing the answer
● Make all systematic knowledge immediately computable
and accessible to everyone
5 nearest stars
How many newspapers are available in the globe
Overall Difficulties
● The number of entities could be huge
● Information Redundancy
● Information Fragmentation
● Entity Information Integration
● A single algorithm for finegrained entity matching may not exist
● Store and retrieve using IR based techniques
● Matching on very large datasets
● Natural Language Processing
Contd...
● Availability of a knowledge base is less● Multi domain entites ‐● Deduplication Problem● Some names and relationships could be incorrect & the
information may not be updatetodate ● Name disambiguation is still largely unsolved● ESEs are at early age
Creating knowledge bases from text and unstructured data is the goal
My Library
● Entites are for UseEntites are for Use
● Each Entity has its own attributes & relationEach Entity has its own attributes & relation
● Every Entity has its importanceEvery Entity has its importance
● Save the Time for finding out EntitesSave the Time for finding out Entites
● Entites are growing rapidlyEntites are growing rapidly
References
1. Statistical Entity Extraction from Web by Zaiqing Nie, Ji-Rong Wen, and Wei-Ying Ma, Fellow, IEEE2. State of the art in IE, overview, comparison and analysis by Stefan Dumitrescu ,PhD Student3. The Entity Name System: Enabling the Web of Entities by Heiko Stoermer, Themis Palpanas, George Giannakopoulos,University of Trento4. Hybrid entity clustering using crowds and data by Jongwuk Lee, Hyunsouk Cho,Jin-Woo Park,Young-rok Cha,Seung-won Hwang, Zaiqing Nie ,Ji-Rong Wen5. Supporting Entity Search:A Large-Scale Prototype Search Engine byTao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang
References...
6. OKKAM: Enabling a Web of Entities by Paolo Bouquet ,Heiko Stoermer ,Daniel Giacomuzzi ,University of Trento7. Entity Data Management in OKKAM by Themis Palpanas 1 , Junaid Chaudhry 2 , Periklis Andritsos 1 , Yannis Velegrakis 1 ,1 University of Trento,2 Ajou University8. SPACE AND TIME ENTITY REPOSITORY Human-enhanced time-awaremulti media search funded by EU07 See :http://issuu.com/cubrikproject/docs/issuu.cubrik.d41.unitn.wp4.v1.09. http://api.okkam.org/search/10. http://www.wolframalpha.com/