40
THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No .: US 9, 959 , 311 B2 ( 45 ) Date of Patent : May 1, 2018 ( 54 ) NATURAL LANGUAGE INTERFACE TO DATABASES ( 71 ) Applicant : International Business Machines Corporation , Armonk , NY ( US ) Ê Ê ( 56 ) References Cited U .S . PATENT DOCUMENTS 7, 424 , 701 B2 * 9/ 2008 Kendall . . . . . .. . . . . . . .. . . GOON 5 / 027 706 / 46 7 , 747 , 601 B2 * 6 / 2010 Cooper .. . . .. . . . .. . . .. GO6F 17 / 2785 707 / 708 ( Continued ) ( 72 ) Inventors : Branimir K . Boguraev , Bedford , NY ( US ); Elahe Khorasani , Yorktown Heights , NY ( US ); Vadim Sheinin , Yorktown Heights , NY ( US ) ; Siddharth A . Patwardhan , Yorktown Heights , NY ( US ); Petros Zerfos , New York , NY ( US ) FOREIGN PATENT DOCUMENTS CN CN 103646032 A 104657439 A 3 / 2014 5 / 2015 OTHER PUBLICATIONS ( 73 ) Assignee : International Business Machines Corporation , Armonk , NY ( US ) ( * ) Notice : Subject to any disclaimer , the term of this patent is extended or adjusted under 35 U .S . C . 154 ( b ) by 250 days . ( 21 ) Appl . No .: 14 / 858 , 841 ( 22 ) Filed : Sep . 18 , 2015 ( 65 ) Ê ? Î ce Prior Publication Data US 2017 / 0083569 A1 Mar . 23, 2017 Majdi Owda et al ., Information Extraction for SQL Query Genera tion in the Conversation - Based Interfaces to Relational Databases ( C - BIRD ); Springer . Com ; 2011 and Springer . ( Continued ) Primary Examiner Thanh - Ha Dang ( 74 ) Attorney , Agent , or Firm Cahn & Samuels , LLP ( 57 ) ABSTRACT An embodiment of the invention provides a method wherein a natural language query is received from a user with an interface . An ontological representation of data in a database is received with an input port , including names of concepts and names of concept properties . Template rules are received with the input port , the templates rules being language dependent and ontology independent , the template rules including widely used constructs of a language . Rules are automatically generated with a rule generation engine with the ontological representation of the data in the data base and the template rules to identify entities and relations in the natural language query . Entities and relations are identified with a processor , the entities and relations being identified in the natural language query with the rules . The structured data language query is generated with a query generation engine from the entities and relations . 10 Claims , 28 Drawing Sheets ( 51 ) Int . CI . G06F 1730 ( 2006 . 01 ) G06F 17 / 27 ( 2006 . 01 ) ( 52 ) U . S . CI . CPC . . .. GO6F 17 / 30401 ( 2013 . 01 ); G06F 17 / 2705 ( 2013 . 01 ); G06F 17 / 2765 ( 2013 . 01 ); G06F 17 / 2795 ( 2013 . 01 ); G06F 17 / 3043 ( 2013 . 01 ) ( 58 ) Field of Classification Search CPC . . . .. . . . .. . GO6F 17 / 30401 ; GO6F 17 / 2705 ; G06F 17 / 2765 ; G06F 17 / 2795 ; G06F 17 / 3043 ( Continued ) 1510 1510 Receive a natural language query from a use ! a user L 1520 Receive an ontological representation of the data in the database to be queried 1530 Receive template rutes - - 1540 Automatically generate rules with the ontological representation of the data in the database and the template rules to identify entities and relations in the natural language query -- - 1550 Identify entities and relations in the natural language query with the rules 1560 Generate the structured data language query from the entities and relations

THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

THE NORTHLUTION UNITATE US009959311B2

( 12 ) United States Patent Boguraev et al .

( 10 ) Patent No . : US 9 , 959 , 311 B2 ( 45 ) Date of Patent : May 1 , 2018

( 54 ) NATURAL LANGUAGE INTERFACE TO DATABASES

( 71 ) Applicant : International Business Machines Corporation , Armonk , NY ( US )

Ê Ê

( 56 ) References Cited U . S . PATENT DOCUMENTS

7 , 424 , 701 B2 * 9 / 2008 Kendall . . . . . . . . . . . . . . . . . GOON 5 / 027 706 / 46

7 , 747 , 601 B2 * 6 / 2010 Cooper . . . . . . . . . . . . . . . GO6F 17 / 2785 707 / 708

( Continued ) ( 72 ) Inventors : Branimir K . Boguraev , Bedford , NY

( US ) ; Elahe Khorasani , Yorktown Heights , NY ( US ) ; Vadim Sheinin , Yorktown Heights , NY ( US ) ; Siddharth A . Patwardhan , Yorktown Heights , NY ( US ) ; Petros Zerfos , New York , NY ( US )

FOREIGN PATENT DOCUMENTS CN CN

103646032 A 104657439 A

3 / 2014 5 / 2015

OTHER PUBLICATIONS ( 73 ) Assignee : International Business Machines Corporation , Armonk , NY ( US )

( * ) Notice : Subject to any disclaimer , the term of this patent is extended or adjusted under 35 U . S . C . 154 ( b ) by 250 days .

( 21 ) Appl . No . : 14 / 858 , 841

( 22 ) Filed : Sep . 18 , 2015 ( 65 ) Ê ? Î ce

Prior Publication Data US 2017 / 0083569 A1 Mar . 23 , 2017

Majdi Owda et al . , Information Extraction for SQL Query Genera tion in the Conversation - Based Interfaces to Relational Databases ( C - BIRD ) ; Springer . Com ; 2011 and Springer .

( Continued ) Primary Examiner — Thanh - Ha Dang ( 74 ) Attorney , Agent , or Firm — Cahn & Samuels , LLP ( 57 ) ABSTRACT An embodiment of the invention provides a method wherein a natural language query is received from a user with an interface . An ontological representation of data in a database is received with an input port , including names of concepts and names of concept properties . Template rules are received with the input port , the templates rules being language dependent and ontology independent , the template rules including widely used constructs of a language . Rules are automatically generated with a rule generation engine with the ontological representation of the data in the data base and the template rules to identify entities and relations in the natural language query . Entities and relations are identified with a processor , the entities and relations being identified in the natural language query with the rules . The structured data language query is generated with a query generation engine from the entities and relations .

10 Claims , 28 Drawing Sheets

( 51 ) Int . CI . G06F 1730 ( 2006 . 01 ) G06F 17 / 27 ( 2006 . 01 )

( 52 ) U . S . CI . CPC . . . . GO6F 17 / 30401 ( 2013 . 01 ) ; G06F 17 / 2705

( 2013 . 01 ) ; G06F 17 / 2765 ( 2013 . 01 ) ; G06F 17 / 2795 ( 2013 . 01 ) ; G06F 17 / 3043 ( 2013 . 01 )

( 58 ) Field of Classification Search CPC . . . . . . . . . . . GO6F 17 / 30401 ; GO6F 17 / 2705 ; G06F

17 / 2765 ; G06F 17 / 2795 ; G06F 17 / 3043 ( Continued )

1510 1510 Receive a natural language query from a use ! a user L

1520 Receive an ontological representation of the data in the database to be queried

1530 Receive template rutes

- - 1540 Automatically generate rules with the ontological representation of the data in the database and the template rules to identify entities and relations in the natural language query

- - - 1550 Identify entities and relations in the natural language query with the rules

1560 Generate the structured data language query from the entities and relations

Page 2: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

US 9 , 959 , 311 B2 Page 2

. . . . . . . . . . ( 58 ) Field of Classification Search

USPC . . . . . . . . . . . . . . 707 / 760 See application file for complete search history .

( 56 ) References Cited U . S . PATENT DOCUMENTS

2014 / 0039878 A1 * 2 / 2014 Wasson . . . . . . . . . . . . . . . . . . . G06F 17 / 28 704 / 9

2014 / 0074826 A1 * 3 / 2014 Cooper . . . . . . . . . . . . GO6F 17 / 30672 707 / 722

2014 / 0149446 A15 / 2014 Kuchmann - Beauger et al . 2015 / 0100568 A1 * 4 / 2015 Golden . . . . . . . . . . . . G06F 17 / 30651

707 / 722 2016 / 0004766 A1 * 1 / 2016 Danielyan . . . . . . . . . . . G06F 17 / 2785

707 / 723 2016 / 0179934 Al * 6 / 2016 Stubley . . . . . . . . . . . GO6F 17 / 30401

707 / 722 2016 / 0180437 A1 * 6 / 2016 Boston . . . . . . . . . . . . . . G06Q 30 / 0631

705 / 26 . 7

OTHER PUBLICATIONS

8 , 140 , 556 B2 * 3 / 2012 Rao . . . . . . . . . . . . . . . . . . . G06F 17 / 30389 707 / 759

9 , 471 , 666 B2 * 10 / 2016 Singh G06F 17 / 30654 9 , 652 , 451 B2 * 5 / 2017 Elder . . . . . . . . . . . . . . . . . . . . GO6F 17 / 279

2003 / 0069880 A1 * 4 / 2003 Harrison . . . . . . . . . . . G06F 17 / 30663 2004 / 0068489 A14 / 2004 Dettinger et al . 2005 / 0256889 A1 * 11 / 2005 McConnell . . . . . . . G06F 17 / 30286 2007 / 0022107 A1 * 1 / 2007 Yuan . . . . . . . . . . . . . . . . . G06F 17 / 30684 2007 / 0294233 A1 12 / 2007 Sheu et al . 2008 / 0010259 A1 * 1 / 2008 Feng . . . . . . . . . . . . . . . . . . . GO6F 17 / 3087 2008 / 0235199 AL 9 / 2008 Li et al . 2010 / 0070500 Al * 3 / 2010 Cui . . . . . . . G06F 17 / 30557

707 / 736 2010 / 0185643 Al * 7 / 2010 Rao . . . . . . . . . . . . . . . G06F 17 / 30389

707 / 759 2011 / 0196852 A1 * 8 / 2011 Srikanth . . G06F 17 / 3066

707 / 706 2012 / 0136649 A1 * 5 / 2012 Freising . . . . . . . . . . . . . GO6F 17 / 2785

704 / 9 2013 / 0262501 A1 * 10 / 2013 Kuchmann - Beauger . . . . . . . . . .

G06F 17 / 30958 707 / 769

2013 / 0304758 A1 * 11 / 2013 Gruber . . . . . . . . . . . . G06F 17 / 30976 707 / 769

2013 / 0311166 A1 11 / 2013 Yanpolsky

Hasan M Jamil ; A Natural Language Interface Plug - In for Coop erative Query Answering in Biological Databases ; NCBI . Com ; PMC3323828 ; Jun . 11 , 2012 and NCBI . Sefali Koli et al . , Natural Language Query Processing on Dynamic Databases using Semantic Grammar ; IOSR Journal of Computer Engineering and iosrjournals . org . Androutsopoulos et al . , Natural Language Interfaces to Databases An Introduction ; Research Paper No . 709 , Department of Artificial Intelligence , University of Edinburgh , 1994 . ISR and Written Opinion PCT / IB2016 / 055494 , dated Nov . 2 , 2016 , pp . 1 - 11 . English abstract of CN103646032 , Mar . 19 , 2014 . English abstract of CN104657439 , May 27 , 2015 .

* cited by examiner

Page 3: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent

Database

130

Schema

FIG . 1

140

150

May 1 , 2018

Detected Entities & Relations

Creation of SQL Queries

SQL

Database Query

Conversion of DB schema to rules used by NLP pipeline for detection Schema independence +

NLP quality improvements + language ambiguity NLP Pipeline

( Reuse from Watson )

SQL Query Generation ( based on Cognos )

Sheet 1 of 28

Input Text

110 110

120

Query Result

User

US 9 , 959 , 311 B2

Page 4: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

FIG . 2

US 9 , 959 , 311 B2

?

: : : : : : : :

: : : :

: : : : : : :

?

???? ??? ??

… … .

??????? ???

… …

Sheet 2 of 28

… … … … …

… … … stage

?

? ?? ???

???? ???? , ?

? ? ? ? ?

?? ??? ? ? ? ? ? ?

May 1 , 2018

- - - - - - - - - _ - _ - _ _ - - -

- - - - - - - - - - - - - - - - - - -

therows

? ? ? ? ? ? ? ? ? ?

;

*

* * *

*

???????

: : : : : : :

????? ” ? ??????? www . 00000 . com . tw

:

U . S . Patent

: ?? ????

: : : : : :

????? ???? ??

Page 5: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

US 9 , 959 , 311 B2

E ' 13

OZE

OVOGO VLX

00 : 00 00110106

# 102 MON

AxW710

00 00 00 00

MOON

Sheet 3 of 28

309008G IGOS

333 18

552888SOONS 3 * 0

2300 COENOU JEUNEN ONS

May 1 , 2018

2010 N

ego

e

doua oo

ooo 08 3300 songs

2010

CONDON

U . S . Patent

00€

Page 6: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent May 1 , 2018 Sheet 4 of 28 US 9 , 959 , 311 B2

" When was the last time this circuit was fixed ? " FIG . 4A

Page 7: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent

This is related to the time field in the schema of our database .

She wants to know the last entry related to the circuit repair in the database .

May 1 , 2018

When was the last ime this circuit was fixed799

Sheet 5 of 28

She is asking about the circuit , not the pole or the cable .

I also need to check the status field in the database .

FIG . 4B

US 9 , 959 , 311 B2

Page 8: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent

" When was the last time this circuit was fixed ? "

When was the last ume this circuit was fixed ? 93

May 1 , 2018

Circuit _ Name

CIU3265FX Jan - 04 - 2014

Last Repair Date Repair Crew

Peter Smith

SELECT circuit _ name , max ( date ) FROM

ASSET _ CIRCUIT _ TABLE WHERE status = " FIXED "

Sheet 6 of 28

FIG . 4C

FIG . 4D

US 9 , 959 , 311 B2

Page 9: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent

• Employee has salary ; table is Employees ; column is

salary .

Employee works in a department ; table is Employees ; column is employee _ id ; table1 is Departments ; column1 is

department _ id .

May 1 , 2018 Sheet 7 of 28

FIG . 5

US 9 , 959 , 311 B2

Page 10: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent

• root = VAR1 _ has _ VAR2 # template rule

- > has [ hasPartOfSpeech ( “ verb ” ) , hasLemmaForm ( " have " ) ] { subj - > VAR1 [ hasLemmaForm ( " VAR1 " ) ] } { obj - > VAR2 [ hasLemmaForm ( “ VAR2 " ) }

May 1 , 2018

• root = VAR1 _ has _ VAR2 # non lexical rule

- > has [ hasPartOfSpeech ( " verb ” ) , hasLemmaForm ( " have ” ) ]

{ subj - > VAR1 [ ] ] { obj - > VAR2 [ ] }

Sheet 8 of 28

FIG . 6

US 9 , 959 , 311 B2

Page 11: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent

root - employee has salary

- > has [ hasPartOfSpeech ( “ verb ” ) , hasLemmaForm ( " have " ) ] { subj - > employee [ hasLemmaForm ( " employee " ) ] } { obj - > salary [ hasLemmaForm ( " salary ' ) }

May 1 , 2018

root = function _ employee _ has _ salary

- > salary [ hasPartOf?peech ( " noun " ) , hasLemmaForm ( " salary " ) ] { mod _ nobj | mod _ ncomp - > of [ hasLemmaForm ( " of " )

{ objprep - > employee [ hasLemmaForm ( " employee ” ) ] } { mod _ nadj - > value1 [ hasLemmaFormFromList ( " high " , " low " ) ,

hasParseFeature ( " superl " ) ] }

Sheet 9 of 28

FIG . 7

US 9 , 959 , 311 B2

Page 12: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent

Template Rules 810 za 3

Template Rules with No Lexical Constraints

Creating Rules with Concrete Lexical Constraints 840

Rules with Lexical Constraints

May 1 , 2018

820

850

Schema Annotation File 830

Sheet 10 of 28

FIG . 8

US 9 , 959 , 311 B2

Page 13: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent

- -

Accessing and Analyzing Schema Model 910

-

Finding Semantic Relations Between Columns 920 -

Paraphrase Generation 930

May 1 , 2018

Database Schema

Sheet 11 of 28

Schema Annotation File

FIG . 9

US 9 , 959 , 311 B2

Page 14: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent May 1 , 2018 Sheet 12 of 28 US 9 , 959 , 311 B2

ap }

traveling SBBU FIG . 10B

afe

seafoldua PUN

games today

218

saako } de

FIG . 10A

What 51 } { { ake

Page 15: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent May 1 , 2018 Sheet 13 of 28 US 9 , 959 , 311 B2

Above 150 dollars

prices

have FIG . 11

products what

Page 16: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent May 1 , 2018 Sheet 14 of 28 US 9 , 959 , 311 B2

Above 150 dollars products prices

bought FIG . 12

with

who

Page 17: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent May 1 , 2018 Sheet 15 of 28 US 9 , 959 , 311 B2

" Was there any issue reported by neighbors at this location during last

3 days ? "

FIG . 13A

Page 18: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent

Look for addresses around current address

This is asking for all prior incidence

66Was there any issue reported by neighbors at this location during last 3 days ? »

May 1 , 2018

Was there any issue reported by neighbors at this Location during last 3 days 79

gossos dos seus dos seus dos seus dos seus dos seus dos seus dos seus SELECT customer address , issue , location FROM

reported issues table WHERE distance ( location , curr _ loc ) <

100

Sheet 16 of 28

Do the query for the last 3 days .

FIG . 13B

FIG . 13C

US 9 , 959 , 311 B2

Page 19: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent

" Was there any issue reported by neighbors at this location during last

3 days ? "

May 1 , 2018

Address 91 Hunter Lane , New York , NY

106 Hunter Lane , New York , NY

Date

July - 6 - 2014

July - 7 - 2014

Sheet 17 of 28

Issue

Slow connectivity No

connectivity FIG , 13D

US 9 , 959 , 311 B2

Page 20: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent May 1 , 2018 Sheet 18 of 28 US 9 , 959 , 311 B2

" When was the last time this circuit was fixed ? " FIG . 14A

Page 21: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent

This is related to the time field in the schema of our database .

He wants to know the last entry related to the circuit repair in the database .

May 1 , 2018

When was the last ome this circunt was fixed293

He said " This " . It means this

location . Then , I need the GBS location

He is asking about the circuit , not the pole or the cable 233 I also need to check the status field in the database .

Sheet 19 of 28

FIG . 14B

US 9 , 959 , 311 B2

Page 22: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent

" When was the last time this circuit was fixed ? "

$ 6 When was the last time this car cuit was fixed ) >

May 1 , 2018

Circuit _ Name

CIU3265FX Jan - 04 - 2014

Last Repair Date

SELECT circuit _ name , max ( date ) FROM

ASSET _ CIRCUIT _ TABLE WHERE status = " FIXED " AND location = { 41 . 162873 ,

- 73 . 861525 )

Repair Crew

Peter Smith

Sheet 20 of 28

FIG . 140

FIG . 14D

US 9 , 959 , 311 B2

Page 23: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent

1510

Receive a natural language query from a user

1520

Receive an ontological representation of the data in the database to be queried

May 1 , 2018

1530

Receive template rules

1540

Automatically generate rules with the ontological representation of the data in the database and the template rules to identify entities and relations in the natural language query

Sheet 21 of 28

1550

Identify entities and relations in the natural language query with the rules

1560

Generate the structured data language query from the entities and relations FIG . 15

US 9 , 959 , 311 B2

Page 24: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent May 1 , 2018 Sheet 22 of 28 US 9 , 959 , 311 B2

Rule Generation Engine 1630 Query Generation Engine 1650

Interface 1610 1600 FIG . 16

Input Port 1620 Processor 1640

Page 25: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent

- 1710

Receive a natural language query from a user

- 1720

Generate multiple dependency parses of the natural language query

May 1 , 2018

1722

Divide the natural language query into multiple components

1724

Create a single dependency parse by connecting each component of the components with one or more other component of the components

Sheet 23 of 28

1730

Apply rules to all of the dependency parses to identify entities and relations in the natural language query FIG . 17

US 9 , 959 , 311 B2

Page 26: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent May 1 , 2018 Sheet 24 of 28 US 9 , 959 , 311 B2

= Processor 1830 Interface 1810 1800 FIG . 18

= Parser Device 1820

Page 27: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

NETWORK 725

U . S . Patent

1

713 0 7

CPU

CPU

RAM

ROM

1 / 0 ADAPTER COMMUNICATIONS ADAPTER 720

710

710

714

716

May 1 , 2018

. . . . . . .

718

712 715

USER INTERFACE ADAPTER 719 DISPLAY ADAPTER

Sheet 25 of 28

721

724

US 9 , 959 , 311 B2

FIG . 19

Page 28: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

FIG . 20

US 9 , 959 , 311 B2

DEVICE ( S ) VNUBIXO vivivivivivivivivivivivir 14

wwwwwwwwwwwwwwww Avvv

WWWWWWWWWWWWW

SLOVOVYONIN INTERFACE ( S )

DISPLAY

Sheet 26 of 28

, , ,

,

,

* * * * * * * * *

,

* *

OZ

* . * .

, , ,

27

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* * * Uwiiiiiii

40

UINO ONISS3005

- - - -

CACHE

May 1 , 2018

STORAGE www

RAM

28

12 COMPUTER SYSTEM SERVER

U . S . Patent

015

Page 29: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

U . S . Patent May 1 , 2018 Sheet 27 of 28 US 9 , 959 , 311 B2

|

04 * , * ' - ' * * * . * ? ? * . * Frees

* - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - *

, TEL * * * * * * * * * * * * * * * * * *

FFFFFFFFFFFFFFFFFFFFFF " * * * * AMAMAMAMAMAMAHAMAHAMAHAMAMAMAMAN

YaYaYaYaYaYaYaYaYaYaYaYaYaYaYaYaYaY .

ara ' a ' a ' a ' a ' a ' a ' a ' a ' a ' a ' a ' a ' s

?

???? ???? ?

????????????????????????????????????????? P

SSSSSSSS . * * . * * . * 1

FIG . 21

* * * *

* * * . * ? . . * * *

* * , * * - ? ? ? ? ? ? ? ?

- * - * - * - * - * - * - * - * - * - * - * - * - * - = FFFFFF

- - - - - - - - - - - - - - - - - - - -

Page 30: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

US 9 , 959 , 311 B2

77

=

?

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

- -

-

- - - -

-

-

- - - - - -

- - - - - - -

3

4

2

2

2

????? 2

3

Sheet 28 of 28

* * * * * * * * * * * * * * * * * * * * * * * * * *

/? 6 ??? ????? / / ????? / / / * / * / * / * / * /

* '

tti * * * * * * * * * * * *

* ' * ' * ' * ' * ' * ' * ' * ' * * * * *

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ' * ' * ' * ' * ' * ' * ' * ' * ' * ' * ' * ' * ' * ' * ' * ' * ' * ' * ' * ' * ' * ' * ' * : ' : ' : ' : - : ' : - : ' : - : ' : - : ' : - : ' : - : ' : - : ' : -

: ' : - : ' : - : ' : - : ' : - : ' : - : ' : - : ' : ' : - : ' ' .

* * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

May 1 , 2018

t : : : : : : : : : : : : : " .

* : : : : : : : :

- -

-

- - - -

- - -

- - -

- - - - -

- - - -

' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '

' ' ' ' ' ' ' ' ' ' ' ' ' '

' ' ' ' ' ' ' ' ' ' ' * * * * * * * * * * * * *

: : . . : : . . : : : : . . : : . . : : . . : *

. . : : . . ! ? " . . : : : : :

: : - - - . . : : . . : :

- - - - - - : - - - . . . - - : : : " ' ' " : " " , " : " . . : : " : : : " " " " " " " " " ' ' " : : : " . . : * * ' *

?

?

. . : : : . . : : . . : : . . : : : " . . ,

* * * * * * : : : : : : : : : : : : ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '

U . S . Patent

Page 31: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

US 9 , 959 , 311 B2

FIU . -

NATURAL LANGUAGE INTERFACE TO FIG . 1 is a flow diagram illustrating a search query from DATABASES natural language processing to structured query language to

a query result according to an embodiment of the invention . BACKGROUND FIG . 2 illustrates the schema for the sales warehouse

5 database according to an embodiment of the invention . The present invention relates to systems , methods , and FIG . 3 illustrates a system interface according to an

computer program products for a natural language interface embodiment of the invention . to databases . FIG . 4A illustrates a smartphone interface according to an Databases are computerized information storage and embodiment of the invention . retrieval systems . A relational database management system 10 FIG . 4B is a diagram illustrating extraction of key con is a computer database management system ( DBMS ) that cepts from a natural language query according to an embodi uses relational techniques for storing and retrieving data . The most prevalent type of database is the relational data ment of the invention . base , a tabular database in which data is defined so that it can FIG . 4C is a diagram illustrating translation to SQL be reorganized and accessed in a number of different ways . commands according to an embodiment of the invention . A distributed database is one that can be dispersed or 15 FIG . 4D illustrates search results displayed on the smart replicated among different points in a network . An object - phone interface according to an embodiment of the inven oriented programming database is one that is congruent with tion . the data defined in object classes and subclasses . FIG . 5 illustrates rules for the detection of entities and

Regardless of the particular architecture , in a DBMS , a relations according to an embodiment of the invention . requesting entity ( e . g . , an application or the operating sys - 20 FIG . 6 illustrates template rules according to an embodi tem ) demands access to a specified database by issuing a ment of the invention . database access request . Such requests may include , for FIG . 7 illustrates a schema annotation file according to an instance , simple catalog lookup requests or transactions and embodiment of the invention . combinations of transactions that operate to read , change FIG . 8 is a flow diagram illustrating a method for gener and add specified records in the database . These requests are 25 ating rules with lexical constraints according to an embodi made using high - level query languages such as the Struc ment of the invention . tured Query Language ( SQL ) . Illustratively , SQL is used to FIG . 9 is a flow diagram illustrating a method for auto make interactive queries for getting information from and mated schema annotation file creation according to an updating a database such as International Business embodiment of the invention . Machines ' ( IBM ) DB2 , Microsoft ' s SOL Server , and data - 30 FIGS . 10A and 10B are diagrams illustrating parsing of a base products from Oracle , Sybase , and Computer Associ natural language query according to an embodiment of the ates . The term " query " denominates a set of commands for invention . retrieving data from a stored database . Queries take the form FIG . 11 is a diagram illustrating parsing of another natural of a command language that lets programmers and programs language query according to an embodiment of the inven select , insert , update , find out the location of data , and so 35 tion . forth . FIG . 12 is a diagram illustrating parsing of yet another

natural language query according to an embodiment of the SUMMARY OF THE INVENTION invention .

FIG . 13A illustrates a smartphone interface according to An embodiment of the invention provides a method for 40 an embodiment of the invention .

creating a structured data language query , wherein a natural FIG . 13B is a diagram illustrating extraction of key language query is received from a user with an interface . An concepts from a natural language query according to an ontological representation of data in a database is received embodiment of the invention . with an input port , the ontological representation of the data FIG . 13C is a diagram illustrating translation to SOL in the database including names of concepts , and names of 45 commands according to an embodiment of the invention . concept properties . Template rules are received with the FIG . 13D illustrates search results displayed on the smart input port , the templates rules being language dependent and phone interface according to an embodiment of the inven ontology independent , the template rules including widely tion . used constructs of a language . FIG . 14A illustrates a smartphone interface according to

Rules are automatically generated with a rule generation 50 an embodiment of the invention . engine connected to the interface and the input port , the rules FIG . 14B is a diagram illustrating extraction of key being generated with the ontological representation of the concepts from a natural language query according to an data in the database and the template rules to identify entities embodiment of the invention . and relations in the natural language query . Entities and FIG . 14C is a diagram illustrating translation to SQL relations are identified with a processor connected to the rule 55 commands according to an embodiment of the invention . generation engine , the entities and relations being identified FIG . 14D illustrates search results displayed on the smart in the natural language query with the rules . The structured phone interface according to an embodiment of the inven data language query is generated with a query generation tion . engine connected to the processor , the structured data lan - FIG . 15 is a flow diagram illustrating a method for guage query being generated from the entities and relations . 60 creating a structured data language query according to an

embodiment of the invention . BRIEF DESCRIPTION OF THE SEVERAL FIG . 16 is a diagram illustrating a system for creating a

VIEWS OF THE DRAWINGS structured data language query according to an embodiment of the invention .

The present invention is described with reference to the 65 FIG . 17 is a flow diagram illustrating a method for accompanying drawings . In the drawings , like reference querying a database according to an embodiment of the numbers indicate identical or functionally similar elements . invention .

Page 32: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

US 9 , 959 , 311 B2

FIG . 18 is a diagram illustrating a system for querying a data stored in relational data bases . In order to get to the data , database according to an embodiment of the invention . natural language queries from the various lines of business

FIG . 19 is a diagram illustrating a computer program is converted to SQL . product according to an embodiment of the invention . The following use case 2 example involves a field worker

FIG . 20 depicts a cloud computing node according to an 5 for a telephone company . Jane is the field worker for ABC embodiment of the present invention . Telco company in the United States . When she receives a

FIG . 21 depicts a cloud computing environment accord work order , she goes to the field to visit , evaluate , and ing to an embodiment of the present invention . resolve the issue . ABC Telco recently equipped all techni

cians with smartphones . When Jane arrives at the location , FIG . 22 depicts abstraction model layers according to an 10 she needs to know : 1 ) “ when was the last time the circuit was embodiment of the present invention . fixed ? ” , 2 ) “ what was the last issue on the circuit ? ” , 3 )

DETAILED DESCRIPTION “ when was the circuit originally installed ? ” , 4 ) “ what was the last report on the circuit ? what was changed in the last repair ? ” . Jane has to work outside and it is a very cold day . Exemplary , non - limiting embodiments of the present 15 It is very difficult for Jane to type her questions so she speaks invention are discussed in detail below . While specific into her smartphone and sees the information on the screen . configurations are discussed to provide a clear understand More specifically , as illustrated in FIG . 4A , Jane speaks

ing , it should be understood that the disclosed configurations her question “ When was the last time this circuit was fixed ? " are provided for illustration purposes only . A person of into her smartphone . In at least one embodiment , the system ordinary skill in the art will recognize that other configura - 20 resides in the smartphone and interprets the spoken sen tions may be used without departing from the spirit and tence , parses the spoken sentence , extracts key concepts and scope of the invention . their values , and relates the concepts to the database schema .

At least one embodiment of the invention includes a FIG . 4B is a diagram illustrating extraction of key concepts system and method that utilizes natural language processing from a natural language query according to an embodiment ( NLP ) to automatically create structured query language 25 of the invention , wherein the system extracts : that this is ( SQL ) queries from regular English sentences . FIG . 1 is a related to the “ time ” field in the schema of our database from flow diagram illustrating a search query from NLP to SQL the word “ When ” , she wants to know the last entry related to a query result according to an embodiment of the inven to the circuit repair in the database from the words " last tion , wherein input text from a user 110 is received in NLP time ” , she is asking about the circuit , not the pole or the pipeline 120 . Detected entities and relations from the NLP 30 cable from the word " circuit ” , and the status field in the pipeline 120 and data from a database schema 130 can be database should be checked for the word “ fixed ” . used in the creation of SQL queries 140 . The system can convert the spoken sentence to SQL

The data from the database schema 130 can also be used commands and send the SQL query to ABC databases . FIG . in the NLP pipeline 120 . Specifically , conversion of the 4C is a diagram illustrating translation to SQL commands database schema 130 to rules can be used by the NLP 35 according to an embodiment of the invention . The system pipeline 120 for the detection of entities . The SQL query can can return the results of the search to the application running be sent to a database 150 ; and , the query result can be sent on the smartphone . FIG . 4D illustrates search results dis from the database 150 to the user 110 . played on the smartphone interface according to an embodi

Thus , a domain independent system can be provided ment of the invention . through the automatic generation of rules used for the 40 In at least one embodiment of the invention , the detection detection of entities and relations . Template rules can be of entities and relations is rule based , wherein an engine used that are domain independent and that are derived from receives rules of detection as input and finds matches for the language itself . Paraphrase generation can also be used these rules on a dependency tree . To be ontology indepen to automatically enrich system coverage . Robustness can be dent , rules for detection can be created automatically using improved through the use of multiple parses ; and , parts of 45 template rules and a schema annotation file . FIG . 5 illus the question that should be used in the produced SQL but trates a schema annotation file according to an embodiment didn ' t make it can be automatically identified . of the invention . The schema annotation file indicates that

The following use case 1 example involves a CMO in the the database includes a table of “ Employees ” , which retail industry . John is the CMO of ABC Company ; and , he includes a column having " employee _ id ” and a column wants to find out “ What are the types of products that were 50 having " salaries ” , and a table of “ Departments ” , which sold on Oct . 11 , 2014 in shops that are located in New includes a column having " department _ id ” . York ? ” FIG . 2 illustrates an example schema for the sales FIG . 6 illustrates template rules according to an embodi warehouse database according to an embodiment of the ment of the invention . The template rules require that a invention . subject ( VAR1 ) verb ( has ) an object ( VAR2 ) . The template

FIG . 3 illustrates a system interface 300 according to an 55 rules and schema annotation file can be used to automati embodiment of the invention . John can type his question cally generate rules for the detection of entities and relations ( e . g . , “ What are the types of products that were sold on Oct . in a natural language query . Template rules can represent 11 , 2014 in shops that are located in New York " ) into a widely used constructs of a natural language . One example search box 310 and the system will convert his question to is an ownership construct : department has a manager , SQL . Even if John makes a mistake in his question ( e . g . , 60 employee has a salary , customer has an address . Such “ What are the types of the products that were sold in Oct . 11 , construct can be represented as a verb “ to have ” with a 2014 in shops that have been located in New York ” ) , the subject and an object . The subject can describe who owns proper SQL query can be produced . The results of the query something and the object can describe what is owned by the can be displayed in the search results 320 . subject . From the dependency tree perspective , this con

Various lines of business , such as for example , marketing 65 struct can be represented as a graph containing 3 nodes : the associates , merchants , brick - and - mortar store associates , main one being a verb with lemma form “ have ” and two and field workers , may want to have quick interaction with children nodes connected to it . One child node can be

Page 33: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

US 9 , 959 , 311 B2

connected to the main node through an edge labeled “ sub dataItem is a multiple of ( TableName , ColumnName , Filter , ject ” , and another child node can be connected to the main Aggregation Function ) . Knowing the database schema and node through an edge labeled " object ” . In addition , the receiving a set of dataltems , the SQL generator produces the subject node can have a particular lemma form ( e . g . resulting SOL query . “ employee ” ) , and the object also can have a particular 5 Parser errors can sometimes occur , and often times the lemma form ( e . g . “ salary ” ) . Another construct can include dependency tree is not correct . Parsers can produce more subject - verb - object ( where verb is an arbitrary verb ) or than one result , but applications typically use only the best subject - verb - preposition - object . Examples include : a cus parse , best defined by the parser itself . The system can use tomer buys product , employee works in a department , etc . multiple parses , where each parse can contribute to correct Another construct can include adjective ( superlative ) - noun , 10 entities and relations extraction . highest salary , lowest salary etc . FIGS . 10A and 10B are diagrams illustrating parsing of a FIG . 7 illustrates rules for the detection of entities and natural language query ( “ what are the names of employees relations according to an embodiment of the invention . A rule can require that a subject ( employee ) , a verb ( has ) , and traveling today ? ' ' ) according to an embodiment of the inven an object ( salary ) . FIG . 8 is a flow diagram illustrating a 15 tion . If the user is looking for the “ employee travels ” method for generating rules with lexical constraints ( also relation , the parse illustrated in FIG . 10A will have it referred to herein as “ the rules ” ) according to an embodi - whereas the parse illustrated in FIG . 10B will not . If only the ment of the invention . Template rules are received 810 . best parse is used ( e . g . , the parse illustrated in FIG . 10B ) , the Template rules with no lexical constraints 820 and a schema “ employee travels ” relation would not be detected because annotation file 830 are used to create rules with concrete 20 the node “ travelling ” depends on the node “ names ” and not lexical constraints 840 . This results in rules with lexical on the node " employees ” . Use of all parses can allow the constraints 850 . system to detect relations when at least one parse has the

To generate rules automatically , a schema annotation file correct denendency tree fragment for the narticular relation correct dependency tree fragment for the particular relation ( SAF ) can be used . Generating an SAF automatically using that the user is interested in . One parse can have relation # 1 only the database and its schema can facilitate the building 25 correct but relation # 2 wrong , and another parse can have of a complete automated system that learns what it needs relation # 2 correct and relation # 1 wrong . Use of multiple from the database and is ready to answer questions about the parses where each parse contributes to the detection of data in the database . correct relation # i can achieve correct detection of multiple

FIG . 9 is a flow diagram illustrating a method for auto relations . mated schema annotation file creation according to an 30 FIG . 11 is a diagram illustrating parsing of another natural embodiment of the invention . The schema model ( also language query ( “ what products have prices above 150 referred to as the database schema ) can be accessed and dollars ? ' ' ) according to an embodiment of the invention . In analyzed 910 . Semantic relations between columns can be found 920 . Knowing the database schema , the system can this natural language query , " above 150 dollars ” can be find all pairs of column names ( also table to table and table 35 annotated by a numerical comparator annotator ; and , the to column ) that are related to each other based on primary / system can detect the “ product has price ” relation that comes

with a filter " > 150 ” . Such a successful relation detection foreign key relation . In one example , the rules provide that " customer buys products ” and “ manufacturer produces with a filter can be possible because the parser correctly products ” . If the input question is “ John buys products ” , it connected " above 150 dollars ” to prices , which in turn was can be determined that John is a customer and manufacturer . 40 correctly connected to " products ” through “ have ” . Without the rules , it cannot be determined who John is . FIG . 12 is a diagram illustrating parsing of yet another

Paraphrase generation can be performed 930 . In one natural language query ( “ who bought products with prices example , the SAF includes “ employee has a salary ” . Para above 150 dollars ? " ) according to an embodiment of the phrasing as a powerful NLP tool can generate : “ employee invention . In this natural language query , " above 150 dol earns money " , " employee makes $ " , " salary of employee ” , 45 lars ” can be annotated by the numerical comparator anno " employee salary ” , and / or " employee with salary ” . Para - tator , but the product has price ” relation will not be detected phrasing can automatically enrich SAF , thereby enabling the because the node " with ” is connected to " bought ” ( parser system to answer questions that convey the same concept in logic being " he bought something with his own money ” ) . If various ways . the numerical comparator annotated something ( e . g . , “ above

Detection of entities can produce results that are actually 50 150 dollars ” ) , the system can determine what noun it is byproducts of the detection process and should not be connected to ( e . g . , " prices ” in this example ) . In order to included in the final SQL query . For example , a user enters determine what noun can be connected to " prices ” , the the natural language query " what is the highest price of the system can go through all of the names in the rules with products that customers bought on Jan . 25 , 1999 ? ” The term lexical constraints to find a name that has both “ price ” and " products ” is detected through the rule " customers buy 55 a numerical comparator . This rule can be used as if it “ fired ” products ” , which can lead to the creation of Products . Pro - during a normal process of detection of entities and rela duct _ ID dataItem . However , this SQL query may not yield tions . In the case of multiple rules , the question can be posed the highest price but rather a price per product because the to a user asking which construct is closer to what the user SQL generator may automatically insert product _ id in the meant . GROUP BY clause . To avoid such situations , focus detec - 60 The following use case 3 example involves Carl , who is tion can mark dataltems in the natural language query as a cable technician for BCC company in Australia . BCC focus items . If a dataItem has no filter , no aggregation received a call from a customer about their internet speed . function , and no focus , it can be ignored during SQL Carl arrives at the location and tests the signal strength . In generation process . order for Carl to fully evaluate the situation , he needs to

In at least one embodiment , automatic SQL generation 65 know : 1 ) “ whether there were any customer calls in the same needs to know the schema of the database . The SQL vicinity and the issue types ? ” ; 2 ) " what was the signal generator can use as input a set of dataItems , where each strength when measured remotely when customer called ? ” ;

Page 34: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

US 9 , 959 , 311 B2

3 ) “ when was the circuit installed ? ” ; and , 4 ) “ was there a the circuit ( not the pole or cable ) from the term “ circuit ” , the similar issue reported recently on the same circuit the query needs to check the status field in the database from the customer has ? ” . term “ fixed ” . Thus , in this example , the identified entities

FIG . 13A illustrates the interface of Carl ' s smartphone include the terms “ when ” , “ last time ” , “ this " , " circuit ” , and according to an embodiment of the invention , wherein Carl 5 " fixed ” . The system can also obtain John ' s location ( e . g . , asks the system his question verbally : " was there an issue longitude - latitude ) from a GPS device in John ' s smartphone . reported by neighbors at this location during last 3 days ? ” FIG . 14C is a diagram illustrating translation to SQL Carl ' s verbal input can be received and processed with a commands according to an embodiment of the invention . voice processing system in his smartphone . FIG . 13B is a Specifically , the natural language query “ when was the last diagram illustrating extraction of key concepts from a natu - 10 time this circuit was fixed ? ” is translated to " SELECT ral language query according to an embodiment of the circuit _ name , max ( date ) FROM ASSET _ CIRCUIT _ invention . The system can interpret the sentence and extract TABLE WHERE status = " FIXED ” AND location = key concepts and their values and relate the concepts to the ( 41 . 162873 , - 73 . 861525 ) ” . The system can send the SQL database schema . For instance , the system can interpret that commands to the database for execution . FIG . 14D illus Carl is asking for prior incidences from the terms “ was 15 trates search results displayed on the smartphone interface there ” , to look for addresses around the current address from according to an embodiment of the invention . Thus , John the term “ neighbors ” , and to perform the query for the last can see that circuit CIU3265FX was repaired by Peter Smith 3 days from the terms “ last 3 days ” . Thus , in this example , on Jan . 4 , 2014 . the identified entities include the terms was there " , " neigh - FIG . 15 is a flow diagram illustrating a method for bors ” , and “ last 3 days ” . The system can also obtain Carl ' s 20 creating a structured data language ( e . g . , SQL ) query from location ( e . g . , longitude - latitude ) from a GPS device or a natural language query according to an embodiment of the other location capability in Carl ' s smartphone . invention ; and , FIG . 16 is a diagram illustrating a system

FIG . 13C is a diagram illustrating translation to SQL 1600 for creating a structured data language query according commands according to an embodiment of the invention . to an embodiment of the invention . The method illustrated in Specifically , the natural language query " was there an issue 25 FIG . 15 can be performed using the system 1600 . An reported by neighbors at this location during last 3 days ? ” is interface 1610 can receive a natural language query from a translated to “ SELECT customer _ address , issue , location user ( e . g . , what are the names of employees in the marketing FROM reported _ issues _ table WHERE distance ( location , department ? ) 1510 . As used herein , the term " interface ” can curr _ loc ) < 100 " . The system can send the SQL commands to include a computer hardware device , such as , for example , the database for execution . FIG . 13D illustrates search 30 a keyboard , touch screen , mouse , or microphone . results displayed on the smartphone interface according to An input port 1620 can receive an ontological represen an embodiment of the invention . Thus , Carl can see that two tation of the data in the database to be queried 1520 , wherein neighbors experienced slow or no connectivity in the past 3 the ontological representation can include the names of days . tables in the database , column names in the tables , and row

The following use case 4 example involves John , who is 35 names in the tables . For example , the ontological represen the field worker for ABC Utility company in the United tation indicates that there is a table in the database titled States . When John receives a work order , he goes to the field “ Departments ” , which has the following column names : to visit , evaluate , and resolve the issue . ABC Utility recently department , manager name , phone number , mailing address , installed an in - memory database management system to and email address , and the following row names : marketing , improve the efficiency of accessing asset history data during 40 legal , accounting , advertising , and human resources . In field visits . ABC Utility wants to improve the data quality of another example , the ontological representation indicates their asset management system , when a visit is completed . that there is a table in the database titled “ Employees " , When John arrives at the location , he needs to know : 1 ) which has the following column names : employee , “ when was the last time the circuit was fixed ? ” ; 2 ) “ what employee number , start date , manager name , birthday , sal was the last issue on the circuit ? ” ; 3 ) " when was the circuit 45 ary , and department , and the following row names : John originally installed ? ” ; 4 ) " what was the last report on the Doe , Jane Doe , John Smith , and Jane Smith . circuit ? ” ; 5 ) " what was changed in the last repair ? ” . After The input port 1620 can also receive template rules ( e . g . , John repairs the issue , he can update the asset records in the root = VAR1 _ has _ VAR2 # template rule ) 1530 , wherein the database with the following information : 1 ) “ whether the templates rules are language dependent and ontology inde issue is resolved or needs a follow - up visit by another crew 50 pendent widely used constructs of a language . In at least one technician ” ; 2 ) “ what was the root cause of the issue ? ” ; and embodiment , the template rules include a first variable , a 3 ) whether he sees some other issue ( pole / cable / tree trim - connector , and a second variable , wherein the first variable ming ) that needs to be attended to in the near future . is an object ( noun ) , the connector includes one of a verb

FIG . 14A illustrates a smartphone interface according to ( e . g . , " has ” ) or a preposition ( e . g . , “ in ” ) , and the second an embodiment of the invention , wherein John asks the 55 variable is a subject ( noun ) . system his question verbally : " when was the last time this rule generation engine 1630 can automatically generate circuit was fixed ? ” FIG . 14B is a diagram illustrating rules ( e . g . , root = employee _ has _ department ) with the onto extraction of key concepts from a natural language query logical representation of the data in the database and the according to an embodiment of the invention . The system template rules to identify entities and relations in the natural can interpret the sentence and extract key concepts and their 60 language query 1540 . As used herein , the term “ automati values and relate the concepts to the database schema . For c ally ” can include performing an action without human instance , the system can interpret that the query is related to interaction ( e . g . , performing a process step without a direct the time field in the schema of the database from the term and explicit prompt by a human to perform the action ) . The “ when ” , John wants to know the last entry related to the rule generation engine 1630 can be connected to the inter circuit repair in the database from the terms “ last time ” , the 65 face 1610 and / or the input port 1620 . In at least one query relates to the current location ( e . g . , to be determined embodiment , the generating of the rules includes replacing via GPS ) from the term “ this ” , the query is inquiring about the first variable of a template rule with one of the column

Page 35: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

US 9 , 959 , 311 B2 10

names ( e . g . , employee ) , and replacing the second variable of In at least one embodiment of the invention , the gener the template rule with another one of the column names ating of the multiple dependency parses of the natural ( e . g . , department ) . Thus , for example , for the template rule language query can include identifying every noun in the “ root = VAR1 _ has _ VAR2 # template rule ” , “ VAR1 ” and natural language query , and connecting every noun in the “ VAR2 ” are replaced with column names " employee " and 5 natural language query to every other noun in the natural " department ” , respectively , to generate the rule language query in different multiple dependency parses . The “ root = employee _ has _ department ” . nouns can be connected to the other nouns via verbs . The

A processor 1640 can be connected to the rule generation number of multiple dependency parses can be greater than engine 1630 to identify entities and relations in the natural the number of words in the natural language query . In at least language query with the rules 1550 . For example , given the 10 one embodiment , none of the multiple dependency parses natural language query " what are the names of employees in are identical . the marketing department ? ” , the rule generation engine A processor 1830 can be connected to the parser device identifies the entities " employees ” and “ department ” and the 1820 , wherein the processor 1830 can apply rules to all of relations “ table name = employees ” and “ table the dependency parses to identify entities and relations in the name = departments ” . 15 natural language query 1730 . As used herein , the terms

A query generation engine 1650 can be connected to the " processor ” and “ parser device ” can each include a com processor 1640 to generate the structured data language puter hardware device , such as , for example , a micropro query from the entities and relations 1560 . As used herein , cessor , a central processing unit ( CPU ) , etc . the terms “ input port ” , “ rule generation engine ” , “ proces . The applying of the rules to identify entities and relations sor ” , and “ query generation engine ” can each include a 20 in the natural language query to all of the multiple depen computer hardware device , such as , for example , a micro - dency parses can include obtaining a rule having one or processor , a central processing unit ( CPU ) , a server , an more nouns and one or more verbs . When a noun of the rule electronic database , etc . matches a noun in the first multiple dependency parse , the

In at least one embodiment of the invention , the query noun in the first multiple dependency parse can be identified generation engine 1650 automatically generates synonyms 25 as an entity . When a verb of the rule matches a verb in the of at least one of the entities . For example , the query first multiple dependency parse , the verb in the first multiple generation engine automatically generates the synonyms dependency parse can be identified as a relation . For “ staff ” , “ worker ” , and “ associate ” for the entity " employee ” . example , given the parses illustrated in FIGS . 10A and 10B The query generation engine can generate the structured data for the natural language query " what are the names of language query from the synonyms of the entities . 30 employees traveling today ? " , the rules " employee _ has _

FIG . 17 is a flow diagram illustrating a method for name ” and “ employee _ travels _ on _ a _ date ” can be applied to querying a database according to an embodiment of the each of the parses to identify the entities " employee name " invention ; and , FIG . 18 is a diagram illustrating a system and “ date ” and the relation “ travel ” . 1800 for querying a database according to an embodiment of In at least one embodiment of the invention , a qualifier in the invention . The method illustrated in FIG . 17 can be 35 one or more of the multiple dependency parses that is not performed using the system 1800 . An interface 1810 can connected to an entity is identified , where the qualifier receive a natural language query from a user 1710 . For includes a numerical comparison ( e . g . , price is greater than example , a user can type a question using a touchscreen of $ 100 ) , a date comparison ( e . g . , " widget was bought after a mobile device . January 2011 ” ) , and / or a time comparison ( e . g . , " market

A parser device 1820 connected to the interface 1810 can 40 closes before 5 : 00 ” ) . In such cases , the processor 1830 can generate multiple dependency parses of the natural language automatically identify the entity that is attached to the query 1720 ( e . g . , FIGS . 10A , 10B ) . The generating of the qualifier and / or send a question to a user requesting the user multiple dependency parses can include dividing the natural to identify the entity that is attached to the qualifier . language query into multiple components 1722 . For As will be appreciated by one skilled in the art , aspects of example , the natural language query “ What are names of 45 the present invention may be embodied as a system , method employees travelling today ” is divided into seven ( 7 ) com - or computer program product . Accordingly , aspects of the ponents : “ What " ; " are ” ; “ names " ; " of " ; " employees ” ; “ trav - present invention may take the form of an entirely hardware elling ” ; and “ today ” . embodiment or an embodiment combining software and

The generating of the multiple dependency parses can hardware aspects that may all generally be referred to herein also include creating a single dependency parse by connect - 50 as a " circuit , ” " module ” or “ system . ” Furthermore , aspects ing each component of the components with one or more of the present invention may take the form of a computer other component of the components 1724 . FIGS . 10A and program product embodied in one or more computer read 10B each illustrate a single dependency parse . In FIG . 10A , able medium ( s ) having computer readable program code direct links are created between the components “ What ” and embodied thereon . " are ” , " are ” and “ names ” , “ names ” and “ of ” , “ of ” and 55 Any combination of one or more computer readable “ employees " , " employees ” and “ travelling ” , and “ travel - medium ( s ) may be utilized . The computer readable medium ling ” and “ today ” . In FIG . 10B , direct links are created may be a computer readable signal medium or a computer between the components “ What ” and “ are ” , “ are ” and readable storage medium . A computer readable storage “ names ” , “ names ” and “ of ” , “ of ” and “ employees " , " names " medium may be , for example , but not limited to , an elec and " travelling ” , and “ travelling ” and “ today ” . The gener - 60 tronic , magnetic , optical , electromagnetic , infrared , or semi ating of the multiple dependency parses can include identi - conductor system , apparatus , or device , or any suitable fying the part of speech of each component ( e . g . , noun , verb , combination of the foregoing . More specific examples ( a adverb , adjective ) . In at least one embodiment , the connect - non - exhaustive list ) of the computer readable storage ing of each component with one or more other components medium would include the following : an electrical connec includes creating a direct link between a noun and a verb 65 tion having one or more wires , a portable computer diskette , and / or an adverb ( e . g . , " employees ” and “ travelling " ) . A a hard disk , a random access memory ( RAM ) , a read - only direct link can also be created between a verb and two nouns . memory ( ROM ) , an erasable programmable read - only

Page 36: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

12 US 9 , 959 , 311 B2

11 memory ( EPROM or Flash memory ) , an optical fiber , a to be performed on the computer , other programmable portable compact disc read - only memory ( CD - ROM ) , an apparatus or other devices to produce a computer imple optical storage device , a magnetic storage device , or any mented process such that the instructions which execute on suitable combination of the foregoing . In the context of this the computer or other programmable apparatus provide document , a computer readable storage medium may be any 5 processes for implementing the functions / acts specified in tangible medium that can contain , or store a program for use the flowchart and / or block diagram block or blocks . by or in connection with an instruction execution system , Referring now to FIG . 19 , a representative hardware apparatus , or device . environment for practicing at least one embodiment of the A computer readable signal medium may include a propa invention is depicted . This schematic drawing illustrates a gated data signal with computer readable program code 10 n hardware configuration of an information handling / com embodied therein , for example , in baseband or as part of a puter system in accordance with at least one embodiment of carrier wave . Such a propagated signal may take any of a the invention . The system comprises at least one processor variety of forms , including , but not limited to , electro magnetic , optical , or any suitable combination thereof . A or central processing unit ( CPU ) 710 . The CPUs 710 are computer readable signal medium may be any computer 15 interconnected with system bus 712 to various devices such readable medium that is not a computer readable storage as a random access memory ( RAM ) 714 , read - only memory medium and that can communicate , propagate , or transport ( ROM ) 716 , and an input / output ( I / O ) adapter 718 . The I / O a program for use by or in connection with an instruction adapter 718 can connect to peripheral devices , such as disk execution system , apparatus , or device . units 711 and tape drives 713 , or other program storage

Program code embodied on a computer readable medium 20 devices that are readable by the system . The system can read may be transmitted using any appropriate medium , includ - the inventive instructions on the program storage devices ing but not limited to wireless , wireline , optical fiber cable , and follow these instructions to execute the methodology of RF , etc . , or any suitable combination of the foregoing . at least one embodiment of the invention . The system further

Computer program code for carrying out operations for includes a user interface adapter 719 that connects a key aspects of the present invention may be written in any 25 board 715 , mouse 717 , speaker 724 , microphone 722 , and / or combination of one or more programming languages , other user interface devices such as a touch screen device including an object oriented programming language such as ( not shown ) to the bus 712 to gather user input . Additionally , Java , Smalltalk , C + + or the like and conventional procedural a communication adapter 720 connects the bus 712 to a data programming languages , such as the “ C ” programming processing network 725 , and a display adapter 721 connects language or similar programming languages . The program 30 the bus 712 to a display device 723 which may be embodied code may execute entirely on the user ' s computer , partly on as an output device such as a monitor , printer , or transmitter , the user ' s computer , as a stand - alone software package , for example . partly on the user ' s computer and partly on a remote The flowchart and block diagrams in the figures illustrate computer or entirely on the remote computer or server . In the the architecture , functionality , and operation of possible latter scenario , the remote computer may be connected to the 35 implementations of systems , methods and computer pro user ' s computer through any type of network , including a gram products according to various embodiments of the local area network ( LAN ) or a wide area network ( WAN ) , or present invention . In this regard , each block in the flowchart the connection may be made to an external computer ( for or block diagrams may represent a module , segment , or example , through the Internet using an Internet Service portion of code , which comprises one or more executable Provider ) . 40 instructions for implementing the specified logical

Aspects of the present invention are described below with function ( s ) . It should also be noted that , in some alternative reference to flowchart illustrations and / or block diagrams of implementations , the functions noted in the block may occur methods , apparatus ( systems ) and computer program prod - out of the order noted in the figures . For example , two blocks ucts according to embodiments of the invention . It will be shown in succession may , in fact , be executed substantially understood that each block of the flowchart illustrations 45 concurrently , or the blocks may sometimes be executed in and / or block diagrams , and combinations of blocks in the the reverse order , depending upon the functionality flowchart illustrations and / or block diagrams , can be imple involved . It will also be noted that each block of the block mented by computer program instructions . These computer diagrams and / or flowchart illustration , and combinations of program instructions may be provided to a processor of a blocks in the block diagrams and / or flowchart illustration , general purpose computer , special purpose computer , or 50 can be implemented by special purpose hardware - based other programmable data processing apparatus to produce a systems that perform the specified functions or acts , or machine , such that the instructions , which execute via the combinations of special purpose hardware and computer processor of the computer or other programmable data instructions . processing apparatus , create means for implementing the It is understood in advance that although this disclosure functions / acts specified in the flowchart and / or block dia - 55 includes a detailed description on cloud computing , imple gram block or blocks . mentation of the teachings recited herein are not limited to

These computer program instructions may also be stored a cloud computing environment . Rather , embodiments of the in a computer readable medium that can direct a computer , present invention are capable of being implemented in other programmable data processing apparatus , or other conjunction with any other type of computing environment devices to function in a particular manner , such that the 60 now known or later developed . instructions stored in the computer readable medium pro - Cloud computing is a model of service delivery for duce an article of manufacture including instructions which enabling convenient , on - demand network access to a shared implement the function / act specified in the flowchart and / or pool of configurable computing resources ( e . g . networks , block diagram block or blocks . network bandwidth , servers , processing , memory , storage ,

The computer program instructions may also be loaded 65 applications , virtual machines , and services ) that can be onto a computer , other programmable data processing appa - rapidly provisioned and released with minimal management ratus , or other devices to cause a series of operational steps effort or interaction with a provider of the service . This cloud

Page 37: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

US 9 , 959 , 311 B2 13 14

model may include at least five characteristics , at least three has shared concerns ( e . g . , mission , security requirements , service models , and at least four deployment models . policy , and compliance considerations ) . It may be managed

Characteristics are as follows : by the organizations or a third party and may exist on On - demand self - service : a cloud consumer can unilater - premises or off - premises .

ally provision computing capabilities , such as server time 5 Public cloud : the cloud infrastructure is made available to and network storage , as needed automatically without the general public or a large industry group and is owned by requiring human interaction with the service ' s provider . an organization selling cloud services .

Broad network access : capabilities are available over a Hybrid cloud : the cloud infrastructure is a composition of network and accessed through standard mechanisms that two or more clouds ( private , community , or public ) that promote use by heterogeneous thin or thick client platforms 10 remain unique entities but are bound together by standard ( e . g . , mobile phones , laptops , and PDAs ) . ized or proprietary technology that enables data and appli Resource pooling : the provider ' s computing resources are pooled to serve multiple consumers using a multi - tenant cation portability ( e . g . , cloud bursting for load - balancing

between clouds ) . model , with different physical and virtual resources dynami cally assigned and reassigned according to demand . There is 15 A cloud computing environment is service oriented with a sense of location independence in that the consumer a focus on statelessness , low coupling , modularity , and generally has no control or knowledge over the exact semantic interoperability . At the heart of cloud computing is location of the provided resources but may be able to specify an infrastructure comprising a network of interconnected location at a higher level of abstraction ( e . g . , country , state , nodes . or datacenter ) . 20 Referring now to FIG . 20 , a schematic of an example of

Rapid elasticity : capabilities can be rapidly and elastically a cloud computing node is shown . Cloud computing node 10 provisioned , in some cases automatically , to quickly scale is only one example of a suitable cloud computing node and out and rapidly released to quickly scale in . To the consumer , is not intended to suggest any limitation as to the scope of the capabilities available for provisioning often appear to be use or functionality of embodiments of the invention unlimited and can be purchased in any quantity at any time . 25 described herein . Regardless , cloud computing node 10 is Measured service : cloud systems automatically control capable of being implemented and / or performing any of the

and optimize resource use by leveraging a metering capa functionality set forth hereinabove . bility at some level of abstraction appropriate to the type of In cloud computing node 10 there is a computer system / service ( e . g . , storage , processing , bandwidth , and active user server 12 , which is operational with numerous other general accounts ) . Resource usage can be monitored , controlled , and 30 purpose or special purpose computing system environments reported providing transparency for both the provider and or configurations . Examples of well - known computing sys consumer of the utilized service . tems , environments , and / or configurations that may be suit

Service Models are as follows : able for use with computer system / server 12 include , but are Software as a Service ( SaaS ) : the capability provided to not limited to , personal computer systems , server computer

the consumer is to use the provider ' s applications running on 35 systems , thin clients , thick clients , hand - held or laptop a cloud infrastructure . The applications are accessible from devices , multiprocessor systems , microprocessor - based sys various client devices through a thin client interface such as tems , set top boxes , programmable consumer electronics , a web browser ( e . g . , web - based e - mail ) . The consumer does network PCs , minicomputer systems , mainframe computer not manage or control the underlying cloud infrastructure systems , and distributed cloud computing environments that including network , servers , operating systems , storage , or 40 include any of the above systems or devices , and the like . even individual application capabilities , with the possible Computer system / server 12 may be described in the exception of limited user - specific application configuration general context of computer systemexecutable instructions , settings . such as program modules , being executed by a computer

Platform as a Service ( PaaS ) : the capability provided to system . Generally , program modules may include routines , the consumer is to deploy onto the cloud infrastructure 45 programs , objects , components , logic , data structures , and so consumer - created or acquired applications created using on that perform particular tasks or implement particular programming languages and tools supported by the provider . abstract data types . Computer system / server 12 may be The consumer does not manage or control the underlying practiced in distributed cloud computing environments cloud infrastructure including networks , servers , operating where tasks are performed by remote processing devices that systems , or storage , but has control over the deployed 50 are linked through a communications network . In a distrib applications and possibly application hosting environment uted cloud computing environment , program modules may configurations . " be located in both local and remote computer system storage

Infrastructure as a Service ( IaaS ) : the capability provided media including memory storage devices . to the consumer is to provision processing , storage , net - As shown in FIG . 20 , computer system / server 12 in cloud works , and other fundamental computing resources where 55 computing node 10 is shown in the form of a general the consumer is able to deploy and run arbitrary software , purpose computing device . The components of computer which can include operating systems and applications . The system / server 12 may include , but are not limited to , one or consumer does not manage or control the underlying cloud more processors or processing units 16 , a system memory infrastructure but has control over operating systems , stor - 28 , and a bus 18 that couples various system components age , deployed applications , and possibly limited control of 60 including system memory 28 to processor 16 . select networking components ( e . g . , host firewalls ) . Bus 18 represents one or more of any of several types of Deployment Models are as follows : bus structures , including a memory bus or memory control Private cloud : the cloud infrastructure is operated solely ler , a peripheral bus , an accelerated graphics port , and a

for an organization . It may be managed by the organization processor or local bus using any of a variety of bus archi or a third party and may exist on - premises or off - premises . 65 tectures . By way of example , and not limitation , such Community cloud : the cloud infrastructure is shared by architectures include Industry Standard Architecture ( ISA )

several organizations and supports a specific community that bus , Micro Channel Architecture ( MCA ) bus , Enhanced ISA

Page 38: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

15 US 9 , 959 , 311 B2

16 ( EISA ) bus , Video Electronics Standards Association virtually , in one or more networks , such as Private , Com ( VESA ) local bus , and Peripheral Component Interconnects munity , Public , or Hybrid clouds as described hereinabove , ( PCI ) bus . or a combination thereof . This allows cloud computing Computer system / server 12 typically includes a variety of environment 50 to offer infrastructure , platforms and / or

computer system readable media . Such media may be any 5 software as services for which a cloud consumer does not available media that is accessible by computer system / server need to maintain resources on a local computing device . It 12 , and it includes both volatile and non - volatile media , is understood that the types of computing devices 54A - N removable and non - removable media . shown in FIG . 21 are intended to be illustrative only and that

System memory 28 can include computer system readable computing nodes 10 and cloud computing environment 50 media in the form of volatile memory , such as random 10 can communicate with any type of computerized device over access memory ( RAM ) 30 and / or cache memory 32 . Com - any type of network and / or network addressable connection puter system / server 12 may further include other removable ! ( e . g . , using a web browser ) . non - removable , volatile / non - volatile computer system stor - Referring now to FIG . 22 , a set of functional abstraction age media . By way of example only , storage system 34 can layers provided by cloud computing environment 50 ( FIG . be provided for reading from and writing to a nonremovable , 15 21 ) is shown . It should be understood in advance that the non - volatile magnetic media ( not shown and typically called components , layers , and functions shown in FIG . 22 are a “ hard drive ” ) . Although not shown , a magnetic disk drive intended to be illustrative only and embodiments of the for reading from and writing to a removable , non - volatile invention are not limited thereto . As depicted , the following magnetic disk ( e . g . , a “ floppy disk ” ) , and an optical disk layers and corresponding functions are provided : drive for reading from or writing to a removable , non - 20 Hardware and software layer 60 includes hardware and volatile optical disk such as a CD - ROM , DVD - ROM or software components . Examples of hardware components other optical media can be provided . In such instances , each include : mainframes 61 ; RISC ( Reduced Instruction Set can be connected to bus 18 by one or more data media Computer ) architecture based servers 62 ; servers 63 ; blade interfaces . As will be further depicted and described below , servers 64 ; storage devices 65 ; and networks and networking memory 28 may include at least one program product having 25 components 66 . In some embodiments , software compo a set ( e . g . , at least one ) of program modules that are nents include network application server software 67 and configured to carry out the functions of embodiments of the database software 68 . invention . Virtualization layer 70 provides an abstraction layer from

Program / utility 40 , having a set ( at least one ) of program which the following examples of virtual entities may be modules 42 , may be stored in memory 28 by way of 30 provided : virtual servers 71 ; virtual storage 72 ; virtual example , and not limitation , as well as an operating system , networks 73 , including virtual private networks ; virtual one or more application programs , other program modules , applications and operating systems 74 ; and virtual clients and program data . Each of the operating system , one or more 75 . application programs , other program modules , and program In one example , management layer 80 may provide the data or some combination thereof , may include an imple - 35 functions described below . Resource provisioning 81 pro mentation of a networking environment . Program modules vides dynamic procurement of computing resources and 42 generally carry out the functions and / or methodologies of other resources that are utilized to perform tasks within the embodiments of the invention as described herein . cloud computing environment . Metering and Pricing 82

Computer system / server 12 may also communicate with provide cost tracking as resources are utilized within the one or more external devices 14 such as a keyboard , a 40 cloud computing environment , and billing or invoicing for pointing device , a display 24 , etc . ; one or more devices that consumption of these resources . In one example , these enable a user to interact with computer system / server 12 ; resources may comprise application software licenses . Secu and / or any devices ( e . g . , network card , modem , etc . ) that rity provides identity verification for cloud consumers and enable computer system / server 12 to communicate with one tasks , as well as protection for data and other resources . User or more other computing devices . Such communication can 45 portal 83 provides access to the cloud computing environ occur via Input / Output ( 1 / 0 ) interfaces 22 . Still yet , com - ment for consumers and system administrators . Service level puter system / server 12 can communicate with one or more management 84 provides cloud computing resource alloca networks such as a local area network ( LAN ) , a general wide tion and management such that required service levels are area network ( WAN ) , and / or a public network ( e . g . , the met . Service Level Agreement ( SLA ) planning and fulfill Internet ) via network adapter 20 . As depicted , network 50 ment 85 provide pre - arrangement for , and procurement of , adapter 20 communicates with the other components of cloud computing resources for which a future requirement is computer system / server 12 via bus 18 . It should be under - anticipated in accordance with an SLA . sto stood that although not shown , other hardware and / or soft - Workloads layer 90 provides examples of functionality ware components could be used in conjunction with com - for which the cloud computing environment may be utilized . puter system / server 12 . Examples , include , but are not 55 Examples of workloads and functions which may be pro limited to : microcode , device drivers , redundant processing vided from this layer include : mapping and navigation 91 ; units , external disk drive arrays , RAID systems , tape drives , software development and lifecycle management 92 ; virtual and data archival storage systems , etc . classroom education delivery 93 ; data analytics processing

Referring now to FIG . 21 , illustrative cloud computing 94 ; transaction processing 95 ; and enhanced e - mail return environment 50 is depicted . As shown , cloud computing 60 receipts based on cognitive considerations 96 . environment 50 comprises one or more cloud computing The terminology used herein is for the purpose of describ nodes 10 with which local computing devices used by cloud ing particular embodiments only and is not intended to be consumers , such as , for example , personal digital assistant limiting of the invention . As used herein , the singular forms ( PDA ) or cellular telephone 54A , desktop computer 54B , " a " , " an " and " the " are intended to include the plural forms laptop computer 54C , and / or automobile computer system 65 as well , unless the context clearly indicates otherwise . It will 54N may communicate . Nodes 10 may communicate with be further understood that the root terms “ include ” and / or one another . They may be grouped ( not shown ) physically or “ have ” , when used in this specification , specify the presence

Page 39: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

18 US 9 , 959 , 311 B2

17 of stated features , integers , steps , operations , elements , language query further comprises generating the structured and / or components , but do not preclude the presence or data language query from the automatically generated syn addition of at least one other feature , integer , step , operation , onyms of the entities . element , component , and / or groups thereof . 5 . A computer program product comprising :

The corresponding structures , materials , acts , and equiva - 5 a non - transitory computer readable storage medium hav lents of all means plus function elements in the claims below ing stored thereon : are intended to include any structure , or material , for per first program instructions executable by a device to cause forming the function in combination with other claimed the device to receive a natural language query from a elements as specifically claimed . The description of the user ; present invention has been presented for purposes of illus - 10 second program instructions executable by the device to tration and description , but is not intended to be exhaustive cause the device to receive an ontological representa or limited to the invention in the form disclosed . Many tion of data in a database , the ontological representation modifications and variations will be apparent to those of of the data in the database including names of tables in ordinary skill in the art without departing from the scope and spirit of the invention . The embodiment was chosen and 15 the database , column names in the tables , and row described in order to best explain the principles of the names in the tables ; invention and the practical application , and to enable others third program instructions executable by the device to of ordinary skill in the art to understand the invention for cause the device to receive template rules used to various embodiments with various modifications as are automatically generate rules for detection of entities suited to the particular use contemplated . 20 and relations in the natural language query , the template

rules being language dependent and ontology indepen What is claimed is : dent , the template rules including widely used con 1 . A method for creating a structured data language query , structs of a language ;

said method comprising : fourth program instructions executable by the device to receiving a natural language query from a user with an 25 cause the device to automatically generate rules to

interface ; identify entities and relations in the natural language receiving an ontological representation of data in a data query , the automatically generated rules being gener

base with an input port , the ontological representation ated with the ontological representation of the data in of the data in the database including names of tables in the database and the template rules , wherein said fourth the database , column names in the tables , and row 30 program instructions : names in the tables ; replace a first variable of a template rule with one of the

receiving template rules with the input port , the templates column names , and rules being used to automatically generate rules for replace a second variable of the template rule with detection of entities and relations in the natural lan another one of the column names ; guage query , the template rules being language depen - 35 fifth program instructions executable by the device to dent and ontology independent , the template rules cause the device to identify entities and relations in the including widely used constructs of a language ; natural language query with the automatically gener

automatically generating rules with a rule generation ated rules ; and engine connected to the interface and the input port , the sixth program instructions executable by the device to automatically generated rules being generated with the 40 c ause the device to generate the structured data lan ontological representation of the data in the database guage query from the entities and relations . and the template rules to identify entities and relations 6 . The computer program product according to claim 5 , in the natural language query , wherein said automati - further comprising seventh program instructions executable cally generating of the rules includes : by the device to cause the device to automatically generating replacing a first variable of a template rule with one of 45 synonyms of at least one of the entities , wherein said sixth

the column names , and program instructions cause the device to generate the struc replacing a second variable of the template rule with tured data language query from the automatically generated

another one of the column names ; synonyms of the entities . identifying entities and relations with a processor con 7 . The computer program product according to claim 5 ,

nected to the rule generation engine , the entities and 50 further comprising seventh program instructions executable relations being identified in the natural language query by the device to cause the device to automatically generate with the automatically generated rules ; and paraphrases to enrich ontological representation of the data ,

generating the structured data language query with a wherein said automatically generating of the rules to identify query generation engine connected to the processor , the entities and relations in the natural language query is based structured data language query being generated from 55 on the automatically generated paraphrases . the entities and relations . 8 . A system for creating a structured data language query ,

2 . The method according to claim 1 , wherein the template said system comprising : rule includes a connector , wherein the connector includes an interface , said interface is operable to receive a natural one of a verb and a preposition . language query from a user ;

3 . The method according to claim 1 , further comprising 60 an input port , said input port is operable to receive : automatically generating paraphrases to enrich ontological an ontological representation of data in a database , the representation of the data , wherein said automatically gen ontological representation of the data in the database erating of the rules to identify entities and relations in the including names of tables in the database , column natural language query is based on the paraphrases . names in the tables , and row names in the tables , and

4 . The method according to claim 1 , further comprising 65 template rules used to automatically generate rules for automatically generating synonyms of at least one of the detection of entities and relations in the natural lan entities , wherein said generating of the structured data guage query , said template rules being language depen

Page 40: THE NORTHLUTION UNITATEpatwardhans.net/papers/BoguraevEtAl18b.pdf · THE NORTHLUTION UNITATE US009959311B2 ( 12 ) United States Patent Boguraev et al . ( 10 ) Patent No . : US 9 ,

US 9 , 959 , 311 B2 19

dent and ontology independent , the template rules including widely used constructs of a language ;

a rule generation engine connected to said interface and said input port , said rule generation engine is operable to automatically generate rules , the automatically gen erated rules being generated with the ontological rep resentation of the data in the database and the template rules to identify entities and relations in the natural language query , wherein said rule generation engine : replaces a first variable of a template rule with one of 10

the column names , and replaces a second variable of the template rule with

another one of the column names ; a processor connected to said rule generation engine , said

processor is operable to identify entities and relations , 15 the entities and relations being identified in the natural language query with the automatically generated rules ; and

a query generation engine connected to said processor , said query generation engine is operable to generate the 20 structured data language query from the entities and relations .

9 . The system according to claim 8 , wherein the template rule includes a connector , wherein the connector includes one of a verb and a preposition . 25

10 . The system according to claim 8 , wherein said query generation engine automatically generates synonyms of at least one of the entities , and generates the structured data language query from the automatically generated synonyms of the entities . 30

25

* * * * *