39
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab University Of Georgia; http://lsdis.cs.uga.edu *Now Semagix, http://www.semagix.com © Amit Sheth Keynote CONTENT- AND SEMANTIC-BASED INFORMATION RETRIEVAL @ SCI 2002

SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Embed Size (px)

Citation preview

Page 1: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL

SECURITY

Amit Sheth

CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

University Of Georgia; http://lsdis.cs.uga.edu

*Now Semagix, http://www.semagix.com

July 15, 2002 © Amit Sheth

Keynote

CONTENT- AND SEMANTIC-BASED INFORMATION RETRIEVAL @ SCI 2002

Page 2: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

New Enterprise Content Management

Challenges1. More variety and complexity

More formats (MPEG, PDF, MS Office, WM, Real, AVI, etc) More types (Docs, Images -> Audio, Video, Variety of text-

structured, unstructured) More sources (internal, extranet, internet, feeds)

2. Saclability, Information Overload Too much data, precious little information (Relevance)

3. Creating Value from Content How to Distribute the right content to the right people as needed?

(Personalization -- book of business) Customized delivery for different consumption options

(mobile/desktop, devices) Insight, Decision Making (Actionable)

Page 3: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

New Enterprise Content Management Technical

Challenges1. Aggregation

Feed handlers/Agents that understand content representation and media semantics

Push-pull, Web-DB-Files, Structured-Semi-structured-Unstructured data of different types

2. Homogenization and Enhancement Enterprise-wide common view

Domain model, taxonomy/classification, metadata standards Semantic Metadata– created automatically if possible

3. Semantic Applications Search, personalization, directory, alerts, etc. using metadata and

semantics (semantic association and correlation), for improved relevance, intelligent personalization, customization

Page 4: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

The Semantic Web -- a vision with several views:•·“The Web of data (and connections) with meaning in the sense that a computer program can learn enough about what data means to process it.” [B99]•·“The semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” [BHL01]•·“The Semantic Web is a vision: the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications. [W3C01]

Semantics: The Next Step in the Web’s Evolution

Page 5: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Semantics for the Web

On the Semantic Web every resource (people, enterprises, information services, application services, and devices) are augmented with machine processable descriptions to support the finding, reasoning about (e.g., which service is best), and using (e.g., executing or manipulating) the resource. The idea is that self-descriptions of data and other techniques would allow context-understanding programs to selectively find what users want, or for programs to work on behalf of humans and organizations to make them more efficient and productive.

Page 6: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Central Role of Metadata

Where is the

content? Whose is

it?

ProduceAggregate

What is this

content about?

Catalog/Index

What other

content is it

related to?

Integrate Syndicate

What is the right

content for this user?

Personalize

What is the best way to

monetize this interaction?

Interactive Marketing

Broadcast,Wireline,Wireless,Interactive TV

Semantic Metadata

ApplicationsBack End

"A Web content repository without metadata is like a library without an index." - Jack Jia, IWOV“Metadata increases content value in each step of content value chain.” Amit Sheth

Page 7: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

A Metadata Classification

Data (Heterogeneous Types/Media)(Heterogeneous Types/Media)

Content Independent Metadata (creation-date, location, type-of-sensor...)(creation-date, location, type-of-sensor...)

Content Dependent Metadata (size, max colors, rows, columns...)(size, max colors, rows, columns...)

Direct Content Based Metadata (inverted lists, document vectors, LSI)(inverted lists, document vectors, LSI)

Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML(C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...)Document Type Definitions, C program structure...)

Domain Specific Metadata area, population (Census),area, population (Census), land-cover, relief (GIS),metadata land-cover, relief (GIS),metadata concept descriptions from ontologiesconcept descriptions from ontologies

OntologiesClassificationsClassificationsDomain ModelsDomain Models

User

More More

SemanticsSemantics

for for

Relevance Relevance

to tackleto tackle

InformationInformation

Overload!!Overload!!

Page 8: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Semantic Content Organization and Retrieval Engine (SCORE) technology

• Automatically aggregates and extracts information from disparate sources and multiple formats• Automatically tags/annotates and categorizes content• Automatically creates relevant associations

- Maps content topics and their relationships• Semantic query engine relates information and knowledge both internal and external to the organization into a single view

Page 9: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

SCORE Architecture

Distributed agents that automatically extract relevantsemantic metadata from structured and unstructured content

Fast main-memory based query engine with APIs and XML output

CACS provides automatic classification (w.r.t. WorldModel)from unstructured text and extracts contextually relevant metadata

Distributed agents that automatically extract/mineknowledge from trusted sources

Toolkit to design and maintain the KnowledgebaseKnowledgebase represents the real-world instantiation(entities and relationships) of the WorldModel

WorldModel specifies enterprise’snormalized view of information (ontology)

Page 10: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Voquette Enterprise Semantic Platform Product Components

World ModelWMToolkit

Knowledgebaseand

MetabaseMain Memory

Index

XML APIsWeb

Services

EnterpriseApplications

EA

EA

EA

Semantic EngineSearch Alerts Portals DirectoryPersonalize

Enhancement Engine

CA

CA

CA

ContentAgent

Monitor

ContentAgents

Databases

XML/Feeds

Websites

Email

ContentSources

Entity Extraction, Enhanced Metadata,

Domain Experts

AutomaticClassification

Classification Committee

Reports

Documents

Stru

ctu r

edSe

mi -

Stru

ctu r

edU

nstr

uct u

red

CAToolkit

KnowledgeAgent

Monitor

KS

KS

KS

KS

KA

KA

KA

KnowledgeSources

KnowledgeAgents

KAToolkit

Knowledgebase

KBToolkit

KnowledgeAgent

Monitor

KS

KS

KS

KS

KA

KA

KA

KnowledgeSources

KnowledgeAgents

KAToolkit

Metabase

Enhancement Engine

CA

CA

CA

ContentAgent

Monitor

ContentAgents

Databases

XML/Feeds

Websites

Email

ContentSources

Entity Extraction, Enhanced Metadata,

Domain Experts

AutomaticClassification

Classification Committee

Reports

Documents

Stru

ctu r

edSe

mi -

Stru

ctu r

edU

nstr

uct u

red

CAToolkit

Page 11: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Market Guide (MG)ZDNet (ZD)

Hoover’s (H)Data supplied from NASA (DPL)

Federation of American Scientists (FAS)Central Intelligence Agency (CIA)

The Interdisciplinary Center (ICT)Federal Bureau of Investigation (FBI)

Capital Advantage (CA)Office of Foreign Assets Control (OFAC)

PERSON (OFAC, FBI, DPL)

-politician (OFAC, FBI, CIA, CA)

politician associated with politicalOrganziation

politician held politicalOffice

politician associated with politicalOffice

-terrorist (OFAC, FBI, DPL)

terrorist memberOf organization

terrorist appears on watchList

-companyExecutive (MG)

companyExecutive holdsOffice companyPosition

person has permanent address address (OFAC, FBI)

person has dob(date of birth) (OFAC, FBI)

person has pob(place of birth) (OFAC, FBI)

Knowledge Sources Used

THING

-event (ICT)

terroristOrganization participated in terroristSponsoredEvent (ICT)

-politicalOffice (CIA, CA)

politicalOffice office(s) within govtOrganization

politicalOffice associated with organization

-watchList (OFAC, FBI, DPL)

terroristOrganization appears on watchList (OFAC, FBI, DPL)

-organization (OFAC, FBI, FAS, ICT, CA, CIA)

organization appears on watchList

organization memberOf suborganization

-company

company manufactures product (ZD)

company identifiedBy tickeySymbol (H)

companyposition position in company (MG)

company memberOf industry (H)

-tickerSymbol (H)

tickerSymbol memberOf exchange (H)

PLACE

-organization located in place (H, OFAC)

-religiousAffiliation practiced in place (CIA)

-company headquarters in city (H)

Entity Classes and Relationships populated by these knowledge sources:

JIVA

Page 12: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

SCORE Capabilities

• Semantics (understanding of content and user needs)

• Extreme relevance

• Semantic associations

• Near real-time

• Multiple applications/usage patterns (not just search)

• Automation

• Scalability in all aspects

Page 13: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Technologies Involved

• Ontology driven architecture (definitional, assertional components

• Automatic Classification with classifier committee (multiple technologies, rather than one size fits all)

• Automatic Semantic Metadata Extraction/Annotation

• Semantic associations/ knowledge inferences

• Scalability throughout with distributed architecture and implementation (number of content and knowledge sources, indexing, etc.)

• Main memory implementation, incremental check pointing

Page 14: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Performance

> 10,000 entities/relationships per hr.Population/update rate in a Knowledgebase with 1 million entities/relationships

1 minute (near real-time)Incremental Index Update Frequency

65msQuery Response Time (64 concurrent users) 

1 - 10 msQuery Response Time (light load)

> 1,980,000Queries per server per hour

Page 15: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Information Extraction for Metadata Creation

WWW, EnterpriseRepositories

METADATAMETADATA

EXTRACTORSEXTRACTORS

Digital Maps

NexisUPIAPFeeds/

Documents

Digital Audios

Data Stores

Digital Videos

Digital Images. . .

. . . . . .

Key challenge: Create/extract as much (semantics)metadata automatically as possible

Page 16: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Video withEditorialized Text on the Web

Automatic Categorization & Metadata Tagging (Web page)

AutoCategorization

AutoCategorization

Semantic MetadataSemantic Metadata

Page 17: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Extraction Agent

Web Page Enhanced Metadata Asset

Content Extraction and Knowledgebase Enhancement

Page 18: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Semantic Metadata

Syntax Metadata

Content Enhancement Workflow

Page 19: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

ExtractorAgent

forBloomberg

Scans text for analysis

Metadataextractedautomatically

AssetSyntax MetadataProducer: BusinessWireSource: BloombergDate: Sept. 10 2001Location: San Jose, CAURL: http://bloomberg.com/1.htmMedia: Text

Semantic Metadata Company: Cisco Systems, Inc.

Creates asset (index)out of extracted metadata

AssetSyntax MetadataProducer: BusinessWireSource: BloombergDate: Sept. 10 2001Location: San Jose, CAURL: http://bloomberg.com/1.htmMedia: Text

Semantic Metadata Company: Cisco Systems, Inc.Topic: Company News

Categorization &Auto-Cataloging System (CACS)

Scans text for analysis

Classifies document into pre-defined category/topic

Appends topic metadatato asset

CiscoSystems

CSCO

NASDAQ

Company

Ticker

Exchange

Industry

Sector

Executives

John ChambersTelecomm.

Computer Hardware

Competition

Nortel Networks

Knowledge Base

CEO of

Competes with

Syntax Metadata AssetProducer: BusinessWireSource: BloombergDate: Sept. 10 2001Location: San Jose, CAURL: http://bloomberg.com/1.htmMedia: Text

Semantic Metadata Company: Cisco Systems, Inc.Topic: Company NewsTicker: CSCOExchange: NASDAQIndustry: Telecomm.Sector: Computer HardwareExecutive: John ChambersCompetition: Nortel NetworksHeadquarters: San Jose, CA

Leveragesknowledgeto enhance

metatagging

Enhanced Content Asset

Indexed

Headquarters

San Jose

XML Feed

SemanticEngine

Content Asset Index Evolution

Page 20: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Content which doescontain the wordsthe user asked for

Extractor Agents

Content which does not contain the words

the user asked for, but is about what he asked

for.

Value-added Metadata

Content the user did not think to ask for, but

which he needs to know.

Semantic Associations

+ +

Intelligent ContentIntelligent Content

End-User

Intelligent Content Empowers the User

Page 21: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Example 1 – Snapshots (“Jamal Anderson”)

Click on first result for Jamal Anderson

View metadata. Note that Team name and League name are also included

in the metadata

Search for ‘Jamal Anderson’ in ‘Football’

View the original source HTML page. Verify that

the source page contains no mention of Team name and League name. They

are value-additions to the metadata to facilitate

easier search.

Page 22: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Focused relevantcontent

organizedby topic

(semantic categorization)

Automatic ContentAggregationfrom multiple

content providers and feeds

Related relevant content not

explicitly asked for (semantic

associations)

Competitive research inferred

automatically

Automatic 3rd party content

integration

Semantic Application Example – Research Dashboard

Page 23: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Related Stock

News

Related Stock

News

Semantic Web – Intelligent Content

IndustryNews

IndustryNews

Technology Products

Technology Products

COMPANYCOMPANY

SECEPAEPA

RegulationsRegulations

CompetitionCompetition

COMPANIES in Same or Related INDUSTRY

COMPANIES inINDUSTRY with Competing PRODUCTS

Impacting INDUSTRY or Filed By COMPANY

Important to INDUSTRY or COMPANY

Intelligent Content = What You Asked for + What you need to know!

Page 24: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Syntax Metadata

Semantic Metadata

led by

Same entity

Human-assisted inference

Knowledge-based & Manual Associations

Page 25: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Intelligence Analyst Browsing Scenario

Page 26: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Innovations that affect User Experience

• BSBQ: Blended Semantic Browsing and Querying

– Ability to query and browse relevant desired content in a highly contextual manner

• Seamless access/processing of Content, Metadata and Knowledge

– Ability to retrieve relevant content, view related metadata, access relevant knowledge and switch between all the

above, allowing user to follow his train of thought

• dACE: dynamic Automatic Content Enhancement

– Ability to provide enhanced annotation features, allowing the user to retrieve relevant knowledge about significant

pieces of content during content consumption

• Semantic Engine APIs with XML output

– Ability to create customized APIs for the Semantic Engine involving Semantic Associations with XML output to

cater to any user application

Page 27: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

VisionicsAcSysSecurity Portal

Check-in

Interrogation

Boarding Gate AirportAirspace

VoquetteKnowledgebase

MetabaseThreat Scoring

Gov’t WatchlistsNews Media

Web Info

LexisNexisRiskWise

Passenger RecordsReservation Data

Airline DataAirport Data

Airline and Airport Data Future and Current Risks

Airport LEO

ARC AvSec ManagerData Management

Data Mining

IPG

Page 28: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Sources Used

Knowledge Sources:FBI - Most Wanted Terrorists

Denied Persons Lists

Terrorism Files

ICT

Office of Foreign Asset Control (OFAC)

Hamas terrorists

CNN Locations

FAA_Airport_Codes

About.com

Comtex_International

Hindustan Times

JerusalemPost

CNN

Newstrove_Hamas

Content Sources :

Africa News Service

AFX News – Asia/UK/Europe

AP Worldstream

Asia Pulse

BusinessWire

ComputerWire (CTW)

EFE News Services

FWN Select

Itar-TASS

Knight Ridder News (Open)

Knight-Ridder Open

M2 - International

M2 Airline Industry Information

New World Publishing

PR Newswire

PRLine (PRL)

Resource News International

RosBusiness

United Press International

UPI Spotlights

Page 29: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Voquette’s Semantic

Technology enables flight

authorities to :

- take a quick look at the

passenger’s history

- check quickly if the passenger is

on any official watchlist

- interpret and understand

passenger’s links to other

organizations (possibly terrorist)

- verify if the passenger has

boarded the flight from a “high

risk” region

- verify if the passenger originally

belongs to a “high risk” region

- check if the passenger’s name

has been mentioned in any news

article along with the name of a

known bad guy

Interrogation Kiosk – Unique Advantages of Voquette

SmithJohn

Page 30: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

SmithJohn

Threat Score Components

WATCHLIST ANALYSIS

Action: Voquette’s rich knowledgebase is automatically searched for the possible appearance of this name on any of the watchlists

Ability Proven: Ability to automatically aggregate relevant rich domain knowledge and automatically co-relate it and rank the threat factors to indicate threat level of the passenger on the watchlist front

METABASE SEARCH

Action: Voquette’s rich metabase is searched for this name and associated content stories mentioning the passenger’s name are retrieved

Ability Proven: Ability to automatically aggregate and retrieve relevant content stories, field reports, etc. about the passenger that can be used by flight officials to determine if the passenger has any connections with known bad people or organizations

appearsOn watchList:

FBI

KNOWLEDGEBASE SEARCH

Action: Voquette’s rich knowledgebase is searched for this name and associated information like position, aliases, relationships (past or present) of this name to other organizations, watchlists, country, etc. are retrieved

Ability Proven: Ability to automatically aggregate relevant rich domain knowledge about a passenger and automatically co-relate it with other data in the knowledgebase to present a visual association picture to the flight official

LEXIS NEXIS ANNOTATION

Action: Information about or related to the passenger returned by Lexis Nexis is enhanced by linking important entities to Voquette’s rich knowledgebase

Ability Proven: Ability to automatically aggregate relevant rich domain knowledge, recognize entities in a piece of text and further automatically co-relate it with other data in the knowledgebase to present a clear picture about the passenger to the flight official

Flight Coutry Check 45 0.15

Person Country Check 25 0.15

Nested Organizations Check 75 0.8

Aggregate Link Analysis Score: 17.7

LINK ANALYSIS

Action: Semantic analysis of the various components (watchlist, Lexis Nexis, knowledgebase search, metabase search, etc.) to come up with an aggregate threat score for the passenger

Ability Proven: Ability to automatically aggregate relevant rich domain knowledge, recognize entities in a piece of text, automatically co-relate it with other data in the knowledgebase, search for relevant content to present an overall idea of the threat level fo the passenger, allowing him to take quick action

Page 31: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

What it will take RDBMS to support flight security application

Link Analysis Component # Queries (Voquette) # Queries (RDBMS) Time (Voquette) Time (RDBMS)

Direct Watchlist Match (person name)lookup person entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve person's relationships to watchlists 1 SQL Query 1 SQL Query .005 sec .005 sec

Organization Watchlist Match (person name, organization name)lookup person entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve person's relationships to organizations 1 SQL Query 1 SQL Query .005 sec .005 secretrieve the organizations' relationships to watchlists 1 SQL Query 1 SQL Query .005 sec .005 seclook up organization entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve the organizations' relationships to watchlists 1 SQL Query 1 SQL Query .005 sec .005 sec

Nested Organization Watchlist Match (person name, organization name)look up organization entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve the organization's relationships to organizations 1 SQL Query 1 SQL Query .005 sec .005 secretrieve the organizations' relationships to watchlists 1 SQL Query 1 SQL Query .005 sec .005 sec

Flight Origin (country name)retrieve country entity 1 SQL Query 1 SQL Query .005 sec .005 secsee if country is on a list containing "high-risk" countries 1 SQL Query 1 SQL Query .005 sec .005 sec

Person Origin (person name)lookup person entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve person's home country 1 SQL Query 1 SQL Query .005 sec .005 secretrieve the organization's relationships to lists containing "high-risk" countries 1 SQL Query 1 SQL Query .005 sec .005 sec

Field Report Search (person name)perform SSE query for field reports that mention this person 1 SSE Request 2 SQL Queries .03 sec 5-30 secretrieve a list of people associated with these field reports 1 SQL Query 1 SQL Query .005 sec .005 secdetermine which people are on watchlists, terrorists, etc… 1 SQL Query 1 SQL Query .005 sec .005 sec

18 requests 39-64 SQL Queries .33 sec 30-80 sec.

Query Comparison:Voquette vs. RDBMS

Page 32: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

JIVA Semantic Console Start-up Interface

The mission of the JIVA project is to gather and analyze as much information of diverse kinds about suspected individuals,

terrorist and other groups, organizations, events, etc. For this Terrorism domain, the JIVA Semantic Console provides an

information retrieval interface (shown below) that displays some fundamental semantic attributes (based on a

corresponding Terrorism domain model) to enable information retrieval in the right context.

Most fundamental semantic attributes

specific to the Terrorism domain

(fully customizable)

Syntactic ordomain-independent attributes for generaland media-specific

search

Analyst can entersearch values in the

appropriate attribute fields (to search

in the right context)

Analyst can choose the type of media

of the desired content

Once all other valuesare set, click the

“Search” button to search semantically

Search interface withmore search features

(explained later)

JIVA Functionality Interface

Page 33: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

“Complete Picture” View – Knowledgebase Results

This section of the ‘Complete Picture’ shows factually known real-world information about the entity (person, organization,

event, etc.) of interest along with its contextual classification(s) and relationships with other entities in the Knowledgebase,

to provide a comprehensive overview of the entity.

Such knowledge is kept up-to-date by means of automated knowledge extractor agents that aggregate such knowledge

about millions of entities from various trusted knowledge sources.

Entity’s canonical name

Entity’s classificationsin taxonomy

Entity’s aliases and other names

Entity’s real-world relationships to various

other entities across multiple entity classes

(as defined in theTerrorism domain model)

Individual related entities are clickableto navigate to a newknowledge page for

that entity e.g. Al Qaeda

- Knowledgebase Knowledgebase NavigationNavigation

While browsing throughrelevant knowledge, analyst can search for content on the

focal entity or any ofthe related entities.

The analyst can alsosearch for specific

relationships between two or more entities

by checking corresponding

entity boxes for search

- Blended Semantic- Blended SemanticBrowsing & QueryingBrowsing & Querying

(BSBQ)(BSBQ)

Fraud investigation offocal entity placing it in

one of five levels of threats, based on score

JIVA

Page 34: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Facilitating Knowledge Discovery

On clicking any bin Laden-related entity (e.g. Al Qaeda), a page is

displayed to the analyst showing knowledge pertaining to that

entity, which can be used in a BSBQ mode, as described on the

previous screen.

Continuing this integrated approach of Semantic Browsing and

Querying, the analyst has the necessary ammunition to perform

Knowledge Discovery. The analyst can follow his train of thought

as he browses and queries to possibly discover unexpected

relationships and links between entities at various levels in an

indirect manner. Automatically uncovering such hidden related

entities facilitates addition of new and meaningful entities and

relationships to the analyst’s assessment tasks.

JIVA

Page 35: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Wireless Application of Semantic Metadata and Automatic Content

Enrichment

MyStocks

News

Sports

Music

MyMedia

$

My Stocks

CSCO

NT

IBM

Market

CSCO

Analyst Call

Conf Call

Earnings

11/08 ON24 Payne

11/07 ON24 H&Q 11/06 CBS Langlesis

CSCO Analysis

Clicking on the link for Cisco Analyst Calls displays a listingsorted by date. Semantic filtering uses just the right metadata to meet screen and other constrains. E.g., Analyst Call focuses on the source and analyst name or company. The icon denote additional metadata, such as “Strong Buy” by H&Q Analyst.

Page 36: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

SceneDescriptionTree

Retrieve Scene Description Track

“NSF Playoff”

Node

Enhanced XML

Description

MPEG-2/4/7

Enhanced Digital Cable

Video

MPEGEncoder

MPEGDecoder

Node = AVO Object

Voqutte/TaaleeSemantic

Engine“NSF Playoff”

Produced by: Fox Sports   Creation Date: 12/05/2000 League: NFLTeams: Seattle Seahawks, Atlanta Falcons Players: John Kitna Coaches: Mike Holmgren, Dan Reeves Location: Atlanta

Object Content Information (OCI)

Metadata-richValue-added Node

Create Scene Description Tree

GREATUSER

EXPERIENCE

Metadata’s role in emerging iTV infrastructure

Channel salesthrough Video Server Vendors,

Video App Servers, and Broadcasters

License metadata decoder and semantic applications to

device makers

Page 37: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Metadata for Automatic Content Enrichment

Interactive Television

This segment has embedded or referenced metadata that isused by personalization application to show only the stocksthat user is interested in.

This screen is customizablewith interactivity featureusing metadata such as whetherthere is a new ConferenceCall video on CSCO.

Part of the screen can beautomatically customized to show conference call specific information– including transcript,participation, etc. all of which arerelevant metadata

Conference Call itself can have embedded metadata to support personalization andinteractivity.

Page 38: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Future

• Multimodal interfaces

• Multimodal semantics

• Multivalent Semantics

Page 39: SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY Amit Sheth CTO, Voquette*, Inc. Large Scale Distributed Information Systems (LSDIS) Lab

Metadata Usage: Keyword, Attribute and Content Based Access

The VisualHarness system at LSDIS/UGA