Dr. Hemant Darbari Programme Co-ordinator Applied Artificial Intelligence Group, & ACTS Advanced...

Preview:

Citation preview

Dr. Hemant DarbariProgramme Co-ordinator

Applied Artificial Intelligence Group, & ACTS Advanced Computing Training School

C-DAC, Pune

darbari@cdac.in

TAG Based ParsingTAG Based Parsing

for for

Machine Translation - Machine Translation - English to Indian LanguageEnglish to Indian Language

WELCOME

OutlineOutline MANTRA: IntroductionMANTRA: Introduction

Parsing Process in TAG: An OverviewParsing Process in TAG: An Overview

Workflow of TAG ParserWorkflow of TAG Parser

Generation Process in MANTRAGeneration Process in MANTRA

Generation Process in MANTRA for Multlingual TranslationGeneration Process in MANTRA for Multlingual Translation

Sample Outputs of MANTRASample Outputs of MANTRA

Samples of Constructions Solved through TAG Samples of Constructions Solved through TAG

Issues Regarding Structural Differences and Translation AccuracyIssues Regarding Structural Differences and Translation Accuracy

System specifications System specifications

MANTRA: AchievementsMANTRA: Achievements

MANTRA: IntroductionMANTRA: Introduction

MANTRAMANTRA

MANTRAMANTRA is an acronym of is an acronym of

MAMAchichiNNe assisted e assisted TRATRAnslation tool.nslation tool.

A Tree Adjoining Grammar (TAG) based Machine Translation System of A Tree Adjoining Grammar (TAG) based Machine Translation System of

Applied AI Group of C-DAC, PuneApplied AI Group of C-DAC, Pune

MANTRA translates English documents into Hindi and other Indian MANTRA translates English documents into Hindi and other Indian

Languages, such as Oriya <O>, Tamil <T>, Urdu <U>, Marathi <M> & Languages, such as Oriya <O>, Tamil <T>, Urdu <U>, Marathi <M> &

Bangla <B>Bangla <B>

MANTRA covers the following domains: MANTRA covers the following domains: Administration, Finance, Administration, Finance, Agriculture, Small Scale Industries, Information Technology and Agriculture, Small Scale Industries, Information Technology and Healthcare, Tourism and Proceedings and documents of Rajya SabhaHealthcare, Tourism and Proceedings and documents of Rajya Sabha

Parsing Process in TAG -Parsing Process in TAG -

An OverviewAn Overview

TAG Stands for Tree Adjoining Grammars

• The formalism of this grammar is based on investigation and research of Arvind Joshi (1987)

• Tree is the basic building blocks of this formalism

• In contrast to other formalism, where dependencies are defined between elements of rule (node), in TAG dependencies are defined between different trees .

Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)

A TAG is defined as a 5-tuple grammar

G = (N, T,S,I,A) where

• N is a finite set of non-terminal symbols

• T is a finite set of terminals

• S is a distinguished non-terminal,

• I is a finite set of trees called initial trees and

• A is a finite set of trees called auxiliary trees

Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)

• This is an LR Parser

• Combines both top-down and bottom-up operations that’s why it is called hybrid parser.

• And Supports Multiple Parallel Parses

Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)

A state S is defined as a 10-tuple,

S=[a, dot, side, pos, l, ft, fr, star, t~, b~]

where:

• a: is the current tree being parsed.

• dot: current position of the dot in the tree a.

• side: is the side of the symbol the dot is on

side E {left, right}.

• pos: is the position of the dot

pos E {above, below}.• l:latest index in the input lexical array

Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)

Contd…

A state S is defined as a 10-tuple,S=[a, dot, side, pos, l, ft, fr, star, t~, b~]

where:• star: is the position of most recently adjuncted node• foot_l: index of input lexical array that is found before foot

node• foot_r: index of input lexical array that is found after foot node• tl* : index of input lexical array corresponding to point of

adjunction as star• bl* : index of input lexical array that is found just before the

foot node at star

Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)

There are two fundamental trees in TAG

• Initial Tree• Auxiliary Tree

Sentences can be represented using a derived tree, constructed from initial and auxiliary tree through Adjunctions and/or Substitutions operation

Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)

Initial tree

• Initial trees represent basic syntactic relation in a sentence

• Every initial tree at the interior node is labeled with a non-terminal symbol

• Every Frontier node is either labeled with terminal symbols or non-terminal symbols which are marked with substitution marked ‘ ‘

• A derivation start with an initial tree combining via either substitution or adjunction

Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)

VP

INITIAL TREES

S_rD_r

V

NP

NP_r

the

left

boy

ND

Non-terminal Nodes

Terminal nodesFrontier nodes

[ α 1]

[ α 3]

[ α 2]

Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)

Initial treeExample for Initial Tree: Ram has arrived

NP

Ram

N

S

NP0 VP

V

VP

VP* (NA)V

has arrived

-> nodes marked as ( ) are substitution mark to indicate initial tree

Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)

Substitution

• Substitution is simple attachment operation

• Substitution replaces a frontier node with another tree whose top node has the same label

• After substitution the result is a derived tree

• Only initial or derived tree can be substituted in another tree

Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)

NP

pretty

NP*

[β1][β3]

[β2]

adj*adv

adj

adj

-> nodes marked as ( * ) are foot nodes.

Auxiliary TreesAuxiliary Trees

VP

VP * adv

today

Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)

Substitution operation

D_r

the

NP_r

boy

ND

[ α ]

[Derived tree]

[ α 1]

NP_r

boy

N

the

D

Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)

Adjunction • Adjunction is an insertion operation .

• Adjunction inserts an auxiliary tree into another tree

• The foot node label of auxiliary tree must match the label of node at which it adjoins.

Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)

NP_r

boy

N

the

D

N*

N

good

adj

Derived tree from substitution operation now become initial tree for adjunction.

[ α ][β3]

Adjunction Operation b/w Initial & Auxiliary TreeSub Tree

Β3 is inserted here below this node

Sub tree is substituted here

Foot node

Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)

NP_r

the

ND

Nadj

good boy

Derived tree after Adjunction

Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)Tree Adjoining Grammar (TAG)

WORK FLOW DIAGRAM

OF

TAG PARSER

Dot Traversal Of Tag Parser

A

B CC D

E F GH I

start end

Predict Operation

Scanner operation :

Complete Operation :

: S_r NA :

: a : : d :

: S : NA

: b : : c :: S* NA :

: S NA :

: a : : d :: S :

: b : : c :

: e :

Derived Tree

• The Earley-type Recognizer for TAGs follows:

The following seven operations on each state

• s = [c~, dot, side,pos, l, f,, fr, star, t~, b~]

1.Scanner

2. Move dot down

3. Move dot up

4. Left Predictor

5. Left Completor

6. Right Predictor

7. Right Completor

Generation Process in

MANTRA

Generation Process in MANTRA

STEP1:

TAG Generator selects a sentence initial tree from corresponding target language.

STEP2:

TAG generator performs the synthesis as per the target language structure (sentence order)

STEP3:

TAG generator performs the following operation such as- substitution,

adjunction , node anchoring, and node embedding.

Generation Process in MANTRAfor

Multilingual Translation

Generator inputMultilingual translation through TAG based parsing and

generation in MANTRA

Jaipur is the pink city of India

GENERATOR O/PEnglish - Hindi

Generator O/P

English - Oriya

English Urdu

Generator o/p

English MarathiGenerator o/p

English TamilGenerator o/p

Sample Outputs of MANTRA

Sample Outputs For English - Hindi

Sample Outputs For English - Marathi

Sample Outputs For English - Oriya

Sample Outputs For English - Urdu

Sample Outputs For English - Bangla

Sample Outputs For English - Tamil

Samples of Constructions Solved through

TAG

Samples of Constructions Solved through

TAG

Passive constructions: The deputation of officers to the post will be governed by the OM referred to above.

Stative Constructions: The leave sanctioned to Shri Bhat stands cancelled.

Transposing and reframing of clause order and phrase order:

Officers possessing experience of the post are hereby promoted......

®ÖÛú¤üß Ûêú †®Öã³Ö¾Ö ¸ÜÖ®Öê¾ÖÖ»Öê †f¬ÖÛúÖ¸üß ...(relative

clause formation)

†f¬ÖÛúÖ¸üß •ÖÖê ®ÖÛú¤üß Ûúê †®Öã³Ö¾Ö ¸üÜÖŸÖê Æïü..(shifting of clause or

phrase order)

Changing of verb class: transitive verb to linking verbs

The post carries a special pay. (transitive) ‡ÃÖ ¯Ö¤ü Ûêúúú ÃÖÖ£Ö ¾Ö¿ÖêÂÖ ¾ÖêŸÖ®Ö ÛúÖ

¯ÖÏÖ¾Ö¬ÖÖ®Ö Æîü … (linking verb)He will be designated as Secretary (Finance).

ˆ®ÖÛúÖ ¯Ö¤ü®ÖÖ´Ö ÃÖf“Ö¾Ö (f¾Ö¢Ö) Ûêú ¹ý¯Ö ´Öë ÆüÖêÝÖÖ … (linking verb)

¾Öê ÃÖf“Ö¾Ö (f¾Ö¢Ö) ¯Ö¤ü®ÖÖf ´ÖŸÖ ÆüÖëÝÖê …

(transitive)

Hanging frozen expressionsOrders have been issued vide Office Memorandum No dol/08/1a to all the rajbhasha

officials

ÃÖ³Öß ¸üÖ•Ö³ÖÖÂÖÖ †f¬ÖÛúÖf¸üµÖÖë ÛúÖê †Ö¤êü¿Ö •ÖÖ¸üß Ûú¸ü f¤ü‹ ÝÖµÖê Æïü ¤êüfÜÖ‹ ÛúÖµÖÖÔ»ÖµÖ –ÖÖ¯Ö®Ö ÃÖÓܵÖÖ ¸Ö. f¾Ö/08/Ûú

Issues regarding

Structural Differences

&

Translation Accuracy

Plural Adjective required Singular Nouns:

The adjective like all, both etc takes singular noun form in sentence rather than the plural.

Ex: Rajasthan State Transport Corporation (RSTC) has bus services to all the major destinations of north India..

Relative pronoun sentence has syntax variation output

Ex: Bikaner is also one major hub for the tourists looking for an adventurous Camel ride, which gives an insight into the exquisite lifestyle of remote Rajasthan.

English to Oriya

Honorific Problem:

It is not possible to provide honorific mark at the contextual behavior.

Ex: The majestic Ashoka pillar records visit of emperor Ashoka to Sarnath.

English to Oriya

Accuracy in Translation from English to Oriya is 50%Accuracy in Translation from English to Oriya is 50%

Postposition not joined to the root

Jaipur , popularly-known-as the Pink-City , is the capital of Rajasthan-state , India

Position of clause

Kaziranga National Park is best known for the one-horned Rhinoceros.

English to Marathi

Accuracy in Translation from English to Marathi is 30%Accuracy in Translation from English to Marathi is 30%

English to Urdu

Urdu is a inflectional or isolating language like Hindi. Basically, the variations in the lexical choices are major features in Urdu.

Problem identified in syntactic level

Arrangement of clausesActivisation of the passive sentence

Accuracy in Translation from English to Urdu is 40%Accuracy in Translation from English to Urdu is 40%

System Specification in MANTRASystem Specification in MANTRA

Available Platforms

Technology

Web Based Solution

(Internet)

Java, EJB

Enterprise Solution

(Intranet)

VC++

Desktop solutions

(Standalone)

VC++

Desktop solutionsDesktop solutions

StandaloneStandalone

SQL versions

(Normal, Encrypted)

My SQL versions

(Normal, Encrypted)

Access version

(Normal)

SQL Express version

(Normal)

MSDE version

(Normal)

MANTRA: AchievementsMANTRA: Achievements

MANTRA Technology MANTRA Technology is a recipient is a recipient

of the Computer world Smithsonian of the Computer world Smithsonian

Award and is a part of theAward and is a part of the

“1999 Innovation Collection” “1999 Innovation Collection” in the in the

National Museum for American National Museum for American

History.History.

MANTRA: Achievements

Launched on 14th Sept 2007 by Honorable Minister of Home Affairs, GOI

MANTRA: Achievements

Papers to be Laid on the Table [PLOT]

List Of Business [LOB]

Parliamentary Bulletin Part-I

MANTRA: Achievements

Launched on 29th August 2007 by Honorable Vice-President of India

Thank You!Thank You!