1
* IBM Research AI, + IBM Canada Natural Language Querying of Complex Business Intelligence Queries Jaydeep Sen * , Fatma Özcan * , Abdul Quamar * , Greg Stager + , Ashish Mittal * , Manasa Jammi * , Chuan Lei * , Diptikalyan Saha * , Karthik Sankaranarayanan * Public Company Public Metric Public Metric Data name Assignment History Person For Person For Company For Company For Metric Insider History title position name name For Company Object Property Data Property Concept Period_type Insider Person For Insider isA bio Industry sic_major Company isA To Industry Financial Service Account Is Owned By Securities Transaction Is Facilitated By Listed Security Refers To Monetary amount Has Last traded value settlement date Has Type Has Amount Security Is Provided By isA hasLegal Name Has Price Business Intelligence (BI) queries provide invaluable insights in the enterprise NL interfaces enable BI querying for business users, who are not SQL experts, beyond fixed reports Existing NLIDB systems fail to handle complex nested SQL queries needed by BI in the enterprise Motivation Detection: Does the input natural language query require nesting? Subquery formation: If nesting is needed, how to divide the query into subqueries? Subquery Joining: How to join subquery results to form the complete nested query? FIBEN Ontology Snapshot System Architecture Emulates real world data mart for a financial application Combines SEC data with transactional TPoX 6 data FIBEN: Finance Domain Benchmark Dataset Example Walkthrough References 1. Diptikalyan Saha,et , “ATHENA: an ontology-driven system for natural language querying over relational data stores”, PVLDB 9(12) 2. Shreyas Bharadwaj, et al, “Creation and Interaction with Large-scale Domain-Specific Knowledge Bases”, in PVLDB 10(12) 3. FIBO. https://spec.edmcouncil.org/fibo/. 4. FRO. http://xbrl.squarespace.com/financial-report-ontology/ 5. SEC Financial Statement Data: https://www.sec.gov/dera/data/financial-statement-data- sets.html . 6. Matthias Nicola, Irina Kogan, and Berni Schiefer, “An XML transaction processing benchmark”, in SIGMOD 2007 Preliminary Results Overall Accuracy Nested Query Accuracy SEC Data 5 Provides information about public companies, their officers and financial metrics Dataset extracted from the public SEC filings submitted as XBRL documents Data curated by running named entity extraction, and entity resolution by IBM Research TPoX Data Transaction Processing benchmark for financial applications. Data generator allows scaling Ontology SQLNest ATHENA NALIR DBPal FIBEN 92.78 65.35 28.86 41.75 Ontology SQLNest ATHENA NALIR DBPal FIBEN 79.71 0.0 10.14 21.73 Data transformed to conform to standard finance ontologies: FIBO 3 (Finance Industry Business Ontology) FRO 4 (Finance Report Ontology) Extension of our earlier system, ATHENA 1,2 : A state-of-the-art Ontology Based NLIDB system Ontology is used to capture the deep domain semantics needed to model the target domain Heuristics to detect and guide subquery formations by combining the use of intelligent lexicon analyzers together with deep domain reasoning over the ontology Generic and domain agnostic system and algorithms, capable of generating complex SQL queries involving selections, aggregations, as well as nesting Rule-based interpretation, no need for training data High accuracy in preliminary results, proving the effectiveness of using a combination of lexical analyzer and deep domain reasoning Overview Ontology to Database Mapping Translational Index Relational Database Domain Ontology Query Translator Users Evidence Annotators Nested Query detector Subquery Formulation Subquery Join Condition Query Building NLQ Engine Nested Query Handler Ranked OQLs Results High Level Steps

Natural Language Querying of Complex Business Intelligence ... · ManasaJammi*, Chuan Lei*, Diptikalyan Saha*, Karthik Sankaranarayanan* Public Company Public Metric Public Metric

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Natural Language Querying of Complex Business Intelligence ... · ManasaJammi*, Chuan Lei*, Diptikalyan Saha*, Karthik Sankaranarayanan* Public Company Public Metric Public Metric

*IBM Research AI, +IBM Canada

Natural Language Querying of Complex Business Intelligence Queries

Jaydeep Sen*, Fatma Özcan*, Abdul Quamar*, Greg Stager+, Ashish Mittal*,

Manasa Jammi*, Chuan Lei*, Diptikalyan Saha*, Karthik Sankaranarayanan*

Public Company

Public Metric

Public Metric Data

name

Assignment History

Person

For PersonFor Company

For Company For Metric

Insider History

title

position

namename

For Company

Object Property

Data Property

Concept

Period_type

Insider Person

For Insider

isA

bio

Industry

sic_majorCompany

isA

To Industry

Financial Service Account

Is Owned By

SecuritiesTransaction

Is Facilitated By

ListedSecurity

Refers To

Monetary amount

Has Last traded value

settlement date

Has Type

Has Amount

Security

Is ProvidedByisA

hasLegalName

Has Price

• Business Intelligence (BI) queries provide invaluable insights in the enterprise

• NL interfaces enable BI querying for business users, who are not SQL experts, beyond fixed reports

• Existing NLIDB systems fail to handle complex nested SQL queries needed by BI in the enterprise

Motivation

• Detection: Does the input natural language query require nesting?• Subquery formation: If nesting is needed, how to divide the query

into subqueries?• Subquery Joining: How to join subquery results to form the

complete nested query?

FIBEN Ontology Snapshot

System Architecture

• Emulates real world data mart for a financial application

• Combines SEC data with transactional TPoX6 data

FIBEN: Finance Domain Benchmark Dataset

Example Walkthrough

References1. Diptikalyan Saha,et , “ATHENA: an ontology-driven system for natural

language querying over relational data stores”, PVLDB 9(12)2. Shreyas Bharadwaj, et al, “Creation and Interaction with Large-scale Domain-Specific

Knowledge Bases”, in PVLDB 10(12)3. FIBO. https://spec.edmcouncil.org/fibo/.4. FRO. http://xbrl.squarespace.com/financial-report-ontology/5. SEC Financial Statement Data: https://www.sec.gov/dera/data/financial-statement-data-sets.html.6. Matthias Nicola, Irina Kogan, and Berni Schiefer, “An XML transaction processing benchmark”, in SIGMOD 2007

Preliminary Results

Overall Accuracy

Nested Query Accuracy

SEC Data5

• Provides information about public companies, their officers and financial metrics

• Dataset extracted from the public SEC filings submitted as XBRL documents

• Data curated by running named entity extraction, and entity resolution by IBM Research

TPoX Data• Transaction Processing benchmark for

financial applications.• Data generator allows scaling

Ontology SQLNest ATHENA NALIR DBPal

FIBEN 92.78 65.35 28.86 41.75

Ontology SQLNest ATHENA NALIR DBPalFIBEN 79.71 0.0 10.14 21.73

Data transformed to conform to standard finance ontologies:• FIBO3 (Finance Industry Business Ontology)• FRO4(Finance Report Ontology)

• Extension of our earlier system, ATHENA1,2 : A state-of-the-art Ontology Based NLIDB system

• Ontology is used to capture the deep domain semantics needed to model the target domain

• Heuristics to detect and guide subquery formations by combining the use of intelligent lexicon analyzers together with deep domain reasoning over the ontology

• Generic and domain agnostic system and algorithms, capable of generating complex SQL queries involving selections, aggregations, as well as nesting

• Rule-based interpretation, no need for training data

• High accuracy in preliminary results, proving the effectiveness of using a combination of lexical analyzer and deep domain reasoning

Overview

Ontology to Database Mapping

Translational Index

Relational Database

Domain Ontology

Query Translator

Users

Evidence Annotators

Nested Query detector

Subquery Formulation

Subquery Join Condition

Query Building

NLQ Engine

Nested Query Handler

Ranked OQLs

Results

High Level Steps