Upload
esther-jefferson
View
223
Download
0
Embed Size (px)
DESCRIPTION
Application Architecture
Citation preview
An Ontological Approach to FinancialAnalysis and Monitoring
Application Architecture
Research Areas
● Data Extraction
● Data Disambiguation
● Semantic Association and Ranking
● MathML-S (MathML with Semantics)
● Ontology schema designed to provide common terminology for a domain.
● Ontology instances represent actual data from a domain.
● Data is extracted from multiple resources and translated into instances of a single ontology.
● Original data sources include:
● Databases● XML files● HTML pages● Text documents● Etc.
Data Extraction
Data Disambiguation
Ongoing research to develop relationship and attribute based disambiguation techniques so that the ontology can be meaningfully populated.
Simple Example:
Is “Athens” the city in Georgia or the city in Greece?
Data Disambiguation
Challenges● Merging two or more databases/ontologies/xml files with multiple references of the same logical entity● Adding new entities to an ontology when a similar entity already exists● Variations in database/ontology/xml schemas● Variations in information representation● Incomplete information● Use of abbreviations, mis-spellings, various naming conventions, format changes, etc.
Data Disambiguation
Schema
Person
-- SSN
-- TelNumber
-- FirstName
-- MiddleName
-- LastName
-- Generation
-- Marital Status
-- Applicant
-- dependent of
-- spouse of
-- works for
-- affiliated with
-- foreign influence event
-- address
Tim Robins
-- 889889889
-- 7065434567
-- Tim
--
-- Robins
--
-- Single
--
--
--
-- People Soft
--
-- event7823
-- place23
Conflicting instances
Timothy Wallace Robinson
-- 889889889
-- 7062123443
-- Timothy
-- Wallace
-- Robinson
--
-- Married
--
--
-- person12332
-- Oracle
--
-- event099
-- place23
Reconciling Oracle and PeopleSoft
indicates the two person entities work
for the same organization
Recognized as a time sensitive attribute
String similarity metrics
Nature of attribute indicates its relative importance – SSN
given a high weight in disambiguating
person entities
Semantic Associations and Ranking
Semantic Associations● Semantic associations are relationships or paths between concepts in an ontology
Ranking● Ranking based on multiple factors
● Number of links, types of links, location in ontology, etc.● Ranking indicates degree of semantic “closeness”
Semantic Associations and Ranking
Characterizing document content in terms of ontology “semantic annotation”
● Correlate words/phrases from document with entities/relationships in ontology
● Entity Identification
● Meta-data added to document (from associated ontological knowledge)
● Active area of research but practically useful technology now available● Constrained to content of ontology
Semantic Associations and Ranking
Semantic Relationships between Documents and Ontology
● Semantic associations: relationships between document concepts and ontology concepts are discovered and ranked
● Ranking based on multiple factors
● no. of links, types of links, location in ontology, …
● Ranking indicates degree of semantic “closeness”
Semantic Associations and Ranking
● Highly relevant
● Closely related
● Ambiguous
● Not relevant
● Undeterminable
Documents Ranking
Semantic Associations and Ranking
Research Content
● Discovery & Ranking of semantic semantic associations● Characterizing “need to know” in terms of ontological concepts & relationships● Meta-data annotation of data and (semi-structured & unstructured) documents
● correlation of document content & concepts in ontology
Semantic Associations and Ranking
Research Challenges
In this project we are addressing:● Discovery of Semantic Associations per entity per document● Input/Visualization/Management of Context of Investigation● Scalability on number of documents & ontology size
● Performs well with thousand documents● Ranking of documents
Semantic Associations and Ranking
Ranking of Documents Relevance
“Closely related entities are more relevant than distant entities”
E = {e | e Document }
Ek = {f | distance(f, eE) = k }
nk
k
k
k
k
Eelevanceentities_R
ERelevancerelations_
Elevanceclasses_Re
0
)(
)(
)(
RelevanceDocument
Semantic Associations and Ranking
Relevance Measures for Documents
● Relevance engine input● the set of semantically annotated documents● the context of investigation for the assignment● the ontology schema represented in RDFS, and the
ontology instances represented in RDF● Relevance measure function used to verify whether the entity
annotations in the annotated document can be fit into the entity classes, entity instances, and/or keywords specified in the context of investigation.
Semantic Associations and Ranking
Ranking of Documents Relevance
Four groups of document-ranking:● Not Related Documents
● unable to determine relation to context● Ambiguously Related Documents
● some relationship exists to the context● Somehow Related Documents
● Entities are closely related to the context● Highly Related Documents
● Entities are a direct match to the context
Semantic Associations and Ranking
Ranking of Documents Relevance continued
Cut-off values determine grouping of documents w.r.t. relevance
● These are customizable cut-off values (more control and more meaningful parameters compared to say automatic classification or statistical approaches)
“Inspection” of a document is possible via (a) original document or (b) original document with highlighted entities
Semantic Associations and Ranking
Ontology-driven Thematic Association LifecycleBuilding a scalable and high
performance capability with support for:
• Task domain ontology creation and maintenance
• Ontology “Knowledge” based on trusted sources supporting Document Classification
• Ontology-driven Semantic Metadata Extraction/Annotation
• Utilizing semantic metadata and ontology to associate document theme(s) with analytical task
• Weighting process used to measure degree of relevance
Task Domain SchemaCreation
Ontology Population
Metadata ExtractionAnd Annotation Enhancement
Thematic Association/Relationships
Discovery
Semantic RelationshipRank Analysis Ontology API
MB KB
MathML-S
● An interface has been developed that allows the user to specify the set of things that need to be verified for any given individual.
● This kind of “ultimate flexibility” is possible due to the ontological approach used.
● An application of research in modeling rules for identifying financial irregularities using MathML (MathML-S)
● Traversals, formulas, rules and profiles represent the data, calculations and checks that need to be performed
● These are created using the graphical interface developed, and stored in the Component Library
MathML-S
● A traversal is a path through the ontology ending with a data-type value.
● A formula is a computation of a value using concepts found in the ontology. It may be constrained using data found in the ontology instance data.
● A rule is a verification (with boolean value) performed on:
● A computed value that come from a formula• The existence of certain types of relationships • Other types of rules may be added based on feedback.
● A profile is a collection of rules, where rules may be given different weights. A profile value can be computed for a person represented in the ontology.
MathML-S
Rule Example
Solvency Ratio Check
Traversals● Asset_T = value of Asset(n)● Liability_T = value of Liability(n)
Formulas● Total_Assets = Asset_T1 + Asset_T2_ + … + Asset_Tn● Total_Liabilities = Liability_T1 + Liability_T2 + … +
Liability_Tn● Solvency_Ratio = Total_Assets / Total_Liabilities
Rule● Solvency_Ratio_Check = Solvency_Ratio > 1.1
MathML-S
● Data integrated in the ontology is queried to verify compliance with respect to a set of customizable rules
● These rules include math calculations, verification of conditions, calculation and verification of ratios, etc.
● The ontological approach allows definition of such rules by using concepts and relationships-types of the ontology. (Thus applicable to sub-concepts)
MathML-S
● Semantic matching and the graph-like nature of ontology provide flexibility on defining the rules yet few research issues are to be addressed
● Operands for a formulas are data items in the ontology. Some data values are retrieved by traversing specific sequences of relationships
● Formulas are self-contained so that they can be re-uses in various rules (thus providing flexibility on maintenance)
● A profile is a set of rules where its verification implies querying the ontology and at the same time computing formulas and rule values
MathML-S