Upload
sijo-thomas
View
81
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Malayalam pos tags used in tdil English malayalam statistical machine translation.
Citation preview
1
CopyrightTDIL
Unified Parts of Speech (POS) Standard in Indian Languages
- Draft Standard ndashVersion 10
Department of Information Technology Ministry of Communications amp Information Technology
Govt of India
2
CopyrightTDIL
CONTENTS
1 INTRODUCTION
2 SCOPE
3 TERMINOLOGY
31 POS Tag
32 XML Schema 33 Metadata
4 WHAT IS A POS TAG
5 REQUIREMENTS OF A POS TAG
51 Need of XML Schema in designing common POS format
6 POS TAG SET FOR INDIAN LANGUAGES
7 XML INTERNATIONALIZATION BEST PRACTICES
71 What is Internationalization Tag Set (ITS)
8 XML SCHEMA
9 METADATA ON POS
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA
11 POS SCHEMA BLOCK DIAGRAM
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES
14 ALGORITHM FOR SELECTION OF NODES
15 REFERENCE BASED IMPLEMENTATION
16 REFERENCE
ANNEXURES
A Language Code Table
3
CopyrightTDIL
1 INTRODUCTION
Parts of Speech tagging is one the key building blocks (noun pronoun verb demonstrative etc) for developing Natural Language Processing applications This POS schema is based on W3C XML Internalization best practices ISO 639-3 Language Codes for Language Identification ISO 126201999 as metadata definition and one to one mapping table for all the labels used in POS Schema
This document sets out the structural part of the XML Schema definition language and also how to make XML POS Schema for tagging XML Schemas including an introduction to the nature of XML Schemas and an introduction to the XML POS Schema abstract data model along with other terminology used throughout this document and also specifies the precise semantics of each component of the abstract model the representation of each component in XML This document contains block diagram that shows the flow-chart of creating XML scheme for POS tagging It also includes the algorithm that contains metadata as per ISO 126201999
2 SCOPE
The common unified XML based POS Schema for Indian Languages based on W3C Internationalization best practices have been formulated The schema has been developed to take into account the NLP requirements for Web based services in Indian Languages This standard specifies XML POS Schema for tagging This portion of the XML Schema Language discusses labels that can be used in an XML POS Schema
3 TERMINOLOGY
31 POS Tag A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word
32 XML Schema XML Schemas express shared vocabularies and allow machines to
carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema
33 Metadata Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted
4
CopyrightTDIL
4 WHAT IS A POS TAG
A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word Parts of speech include nouns verbs adverbs adjectives pronouns conjunction and their sub-categories
The input to a tagging algorithm is a string of words of a natural language sentence and a specified tag set (a finite list of Part-of-speech tags) The output is a single best POS tag for each word
5 REQUIREMENT OF A POS TAG
The POS tagger can be used as a pre-processor Text indexing and retrieval uses POS information POS tagger is used for making tagged corpora and Machine Translation System Speech processing uses POS tags to decide the pronunciation POS tagger would be needed to identify the tag for the words that could not be analysed by the morphological analyser If the Morph gives multiple tags for a word then the tagger could be used to resolve the ambiguity
51 NEED OF XML SCHEMA IN DESIGNING COMMON POS FORMAT
The need of XML for creating POS tag-set is to standardize the POS tag framework for all Indian languages The main benefits of xml in using POS tag set for ILrsquos are bull It Supports multilingual documents and Unicode bull XML allows developers to add extra information to a format without breaking
applications bull XML documents can be stored without using database administrator because they
contain meta data in the form of tags and attributes bull The tree structure of XML documents allows documents to be compared and
aggregated efficiently element by element bull XML documents can consist of nested elements that are distributed over multiple
remote servers It is easier to convert data between different data types
5
CopyrightTDIL
6 POS Tag set for Indian Languages
POS Categories and Labels
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Verbal NNV N__NNV The verbal noun
sub type is only
for languages
such as Tamil and
Malayalam)
14 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
26 INDEFINITE PRI PR__PRI
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
34 Indefinite DMI DM__DMI
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal VN V__VN paTittam
6
CopyrightTDIL
naTattam naTanam
42 Auxiliary VAUX V__VAUX
421 Finite VAUX V__VAUX__VF
422 Non-finite VNF V__VAUX__VNF
423 Infinitive VINF V__VAUX__VINF
424 Gerund VNG V__VAUX__VNG
425 PARTICIP
LE NOUN
VNP V_VAUX_VNP
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
7
CopyrightTDIL
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
POS for Hindi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N ladakaa raajaa kitaaba
11 Common NN N__NN kitaaba kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST Uupara
niice aage
piiche
2 Pronoun PR PR Yaha vaha
jo
21 Personal PRP PR__PRP Vaha main
tuma ve
22 Reflexive PRF PR__PRF Apanaa
swayam
khuda
23 Relative PRL PR__PRL Jo jis jab
jahaaM
24 Reciprocal PRC PR__PRC Paraspara
aapasa
25 Wh-word PRQ PR__PRQ Kauna kab
kahaaM
Indefinite PRI PR__PRI Koii kis
8
CopyrightTDIL
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD Vaha yaha
32 Relative DMR DM__DMR jo jis
33 Wh-word DMQ DM__DMQ kis kaun
Indefinite DMI DM__DMI KoI kis
4 Verb V V giraa gayaa
sonaa
haMstaa
hai rahaa
41 Main VM V__VM giraa gayaa
sonaa
haMstaa
42 Auxiliary VAUX V__VAUX hai rahaa
huaa
5 Adjective JJ JJ sundara
acchaa
baRaa
6 Adverb RB RB jaldii teza
7 Postposition PSP PSP ne ko se
mein
8 Conjunction CC CC aur agar
tathaa
kyonki
81 Co-ordinator CCD CC__CCD aur balki
parantu
82 Subordinator CCS CC__CCS Agar
kyonki to
ki
9 Particles RP RP to bhii hii
91 Default RPD RP__RPD tobhii hii
93 Interjection INJ RP__INJ are he o
94 Intensifier INTF RP__INTF bahuta
behada
95 Negation NEG RP__NEG nahiin
mata binaa
10 Quantifiers QT QT thoRaa
bahuta
kucha eka
pahalaa
9
CopyrightTDIL
101 General QTF QT__QTF thoRaa
bahuta
kucha
102 Cardinals QTC QT__QTC eka do
tiina
103 Ordinals QTO QT__QTO pahalaa
duusaraa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Punjabi
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
11 Common NN N__NN ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara
xiYlI
10
CopyrightTDIL
ਿਦਲੀ
ਤਾਜਮਿਹਲ
wAjamahila
14 Nloc NST N__NST ਤ ਥਲ ਅਗ
ਿਪਛ
uYwe WaYle
aYge piYCe
2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ
ਜ
mEz wUM uha
iha jo
21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha
22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ
ਖਦ
ApaNA Apa
Kuxa
23 Relative PRL PR__PRL ਜ ਿਜਸ
ਿਜਹਡਾ ਜਦ
jo jisa jihadZA
jaxoz
24 Reciprocal PRC PR__PRC ਆਪਸ Apasa
25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz
kiYWe
26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa
3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha
31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha
32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa
33 Wh-word DMQ DM__DMQ ਕਣ kONa
34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa
4 Verb V V ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
41 Main VM V__VM ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
412 Non-finite VNF V__VM__VNF ਜਿਦਆ
ਆਿਦਆ
jAzxiAz
AuzxiAz
karaxiAz
11
CopyrightTDIL
ਕਰਿਦਆ ਖਾਕ
ਜਾਕ
KAke jAke
413 Infinitive VINF V__VM__VINF ਿਗਆ
ਆਇਆ
ਕਿਰਆ
giAz AiAz
kariAz
414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ
ਮਰਨ
jANoz KANoz
pINoz
maranoz
42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ
ਹਇਆ
hE sI sakiA
hoiA
5 Adjective JJ ਸਹਣਾ ਚਗਾ
ਮਾਡਾ ਕਾਾਾ
sohaNA
caMgA
mAdZA kAA
6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI
7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz
nAla
8 Conjunction CC CC ਅਤ ਿਕਿਕ
ਅਗਰ ਿਕ ਸਗ
awe kiuzki
agara ki sagoz
81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz
82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ
ਤ
kiuzki ki jo
wAz
9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI
91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ
ਨੀ ਜਨਾਬ
ue adZiA nI
janAba
94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa
badZA
95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ
ਵਗਰ
nahIz nA
binAz vagEra
10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ ਇਕ
WodZA
bahuwA kAPI
kuJa iYka
12
CopyrightTDIL
ਪਿਹਲਾ pahilA
101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ
WodZA
bahuwA kAPI
kuJa
102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna
103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ
(ਚਾਹ-) ਚਹ
(pANI-) XANI
(cAha-) cUha
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
13
CopyrightTDIL
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun
43 Auxiliary VAUX V__VAUX
431 Non-finite VNF V_VM_VNF
432 Infinite VINF V_VM_VNF
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-
ordinator
CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
14
CopyrightTDIL
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign
word
RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Tamil
Sl No Category Label Annotation Convention
Examples Remarks
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N paiyan
raajaa
puttakam
11 Common NN N__NN puttakam
kaNNaaTi
paTam
12 Proper NNP N__NNP moohan ravi maalati
13 Nloc NST N__NST meel kiiz mun pin
15
CopyrightTDIL
2 Pronoun PR PR ituatuavan
21 Personal PRP PR__PRP naan nii avaL avarkaL
22 Reflexive PRF PR__PRF taan
23 Relative PRL PR__PRL yaar etu eppootu enkee
24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam
25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum
3 Demonstrative DM DM a- i- e-
31 Deictic DMD DM__DMD anta inta enta
32 Relative DMR DM__DMR enta
33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu
4 Verb V V vizu poo tuunku aaku
41 Main VM V__VM vizu poo tuunku ciri
411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL
412 Non-finite VNF V__VM__VNF vizunta poonaal
413 Infinitive VINF V__VM__VINF viza pooka cirikka
414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal
42 Verbal VN V_VN paTippu naTai naTattai ceykai
43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum
5 Adjective JJ iniya periya azakaana
6 Adverb RB veekamaaka viraivaaka
16
CopyrightTDIL
7 Postposition PSP paRRi kuRittu viTa
8 Conjunction CC CC maRRum eenenRaal aanaal
81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu
-um is a co-ordinator which can be added to noun and verb
82 Subordinator CCS CC__CCS enRu ena enpatu enRaal
821 Quotative UT CC__CCS__UT enRu ena
9 Particles RP RP maTTUm kuuTa
91 Default RPD RP__RPD maTTUm kuuTa
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ayyoo teey aamaam
94 Intensifier INTF RP__INTF ati veku mika
95 Negation NEG RP__NEG illai
10 Quantifiers QT QT koncam niRaiya oru mutal
101 General QTF QT__QTF koncam niRaiya
102 Cardinals QTC QT__QTC onRu iraNTu
103 Ordinals QTO QT__QTO mutal iraNTaam
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in script other than the script of the original text
112 Symbol SYM RD__SYM $ amp ( ) ruu
For symbols such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
2
CopyrightTDIL
CONTENTS
1 INTRODUCTION
2 SCOPE
3 TERMINOLOGY
31 POS Tag
32 XML Schema 33 Metadata
4 WHAT IS A POS TAG
5 REQUIREMENTS OF A POS TAG
51 Need of XML Schema in designing common POS format
6 POS TAG SET FOR INDIAN LANGUAGES
7 XML INTERNATIONALIZATION BEST PRACTICES
71 What is Internationalization Tag Set (ITS)
8 XML SCHEMA
9 METADATA ON POS
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA
11 POS SCHEMA BLOCK DIAGRAM
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES
14 ALGORITHM FOR SELECTION OF NODES
15 REFERENCE BASED IMPLEMENTATION
16 REFERENCE
ANNEXURES
A Language Code Table
3
CopyrightTDIL
1 INTRODUCTION
Parts of Speech tagging is one the key building blocks (noun pronoun verb demonstrative etc) for developing Natural Language Processing applications This POS schema is based on W3C XML Internalization best practices ISO 639-3 Language Codes for Language Identification ISO 126201999 as metadata definition and one to one mapping table for all the labels used in POS Schema
This document sets out the structural part of the XML Schema definition language and also how to make XML POS Schema for tagging XML Schemas including an introduction to the nature of XML Schemas and an introduction to the XML POS Schema abstract data model along with other terminology used throughout this document and also specifies the precise semantics of each component of the abstract model the representation of each component in XML This document contains block diagram that shows the flow-chart of creating XML scheme for POS tagging It also includes the algorithm that contains metadata as per ISO 126201999
2 SCOPE
The common unified XML based POS Schema for Indian Languages based on W3C Internationalization best practices have been formulated The schema has been developed to take into account the NLP requirements for Web based services in Indian Languages This standard specifies XML POS Schema for tagging This portion of the XML Schema Language discusses labels that can be used in an XML POS Schema
3 TERMINOLOGY
31 POS Tag A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word
32 XML Schema XML Schemas express shared vocabularies and allow machines to
carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema
33 Metadata Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted
4
CopyrightTDIL
4 WHAT IS A POS TAG
A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word Parts of speech include nouns verbs adverbs adjectives pronouns conjunction and their sub-categories
The input to a tagging algorithm is a string of words of a natural language sentence and a specified tag set (a finite list of Part-of-speech tags) The output is a single best POS tag for each word
5 REQUIREMENT OF A POS TAG
The POS tagger can be used as a pre-processor Text indexing and retrieval uses POS information POS tagger is used for making tagged corpora and Machine Translation System Speech processing uses POS tags to decide the pronunciation POS tagger would be needed to identify the tag for the words that could not be analysed by the morphological analyser If the Morph gives multiple tags for a word then the tagger could be used to resolve the ambiguity
51 NEED OF XML SCHEMA IN DESIGNING COMMON POS FORMAT
The need of XML for creating POS tag-set is to standardize the POS tag framework for all Indian languages The main benefits of xml in using POS tag set for ILrsquos are bull It Supports multilingual documents and Unicode bull XML allows developers to add extra information to a format without breaking
applications bull XML documents can be stored without using database administrator because they
contain meta data in the form of tags and attributes bull The tree structure of XML documents allows documents to be compared and
aggregated efficiently element by element bull XML documents can consist of nested elements that are distributed over multiple
remote servers It is easier to convert data between different data types
5
CopyrightTDIL
6 POS Tag set for Indian Languages
POS Categories and Labels
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Verbal NNV N__NNV The verbal noun
sub type is only
for languages
such as Tamil and
Malayalam)
14 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
26 INDEFINITE PRI PR__PRI
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
34 Indefinite DMI DM__DMI
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal VN V__VN paTittam
6
CopyrightTDIL
naTattam naTanam
42 Auxiliary VAUX V__VAUX
421 Finite VAUX V__VAUX__VF
422 Non-finite VNF V__VAUX__VNF
423 Infinitive VINF V__VAUX__VINF
424 Gerund VNG V__VAUX__VNG
425 PARTICIP
LE NOUN
VNP V_VAUX_VNP
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
7
CopyrightTDIL
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
POS for Hindi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N ladakaa raajaa kitaaba
11 Common NN N__NN kitaaba kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST Uupara
niice aage
piiche
2 Pronoun PR PR Yaha vaha
jo
21 Personal PRP PR__PRP Vaha main
tuma ve
22 Reflexive PRF PR__PRF Apanaa
swayam
khuda
23 Relative PRL PR__PRL Jo jis jab
jahaaM
24 Reciprocal PRC PR__PRC Paraspara
aapasa
25 Wh-word PRQ PR__PRQ Kauna kab
kahaaM
Indefinite PRI PR__PRI Koii kis
8
CopyrightTDIL
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD Vaha yaha
32 Relative DMR DM__DMR jo jis
33 Wh-word DMQ DM__DMQ kis kaun
Indefinite DMI DM__DMI KoI kis
4 Verb V V giraa gayaa
sonaa
haMstaa
hai rahaa
41 Main VM V__VM giraa gayaa
sonaa
haMstaa
42 Auxiliary VAUX V__VAUX hai rahaa
huaa
5 Adjective JJ JJ sundara
acchaa
baRaa
6 Adverb RB RB jaldii teza
7 Postposition PSP PSP ne ko se
mein
8 Conjunction CC CC aur agar
tathaa
kyonki
81 Co-ordinator CCD CC__CCD aur balki
parantu
82 Subordinator CCS CC__CCS Agar
kyonki to
ki
9 Particles RP RP to bhii hii
91 Default RPD RP__RPD tobhii hii
93 Interjection INJ RP__INJ are he o
94 Intensifier INTF RP__INTF bahuta
behada
95 Negation NEG RP__NEG nahiin
mata binaa
10 Quantifiers QT QT thoRaa
bahuta
kucha eka
pahalaa
9
CopyrightTDIL
101 General QTF QT__QTF thoRaa
bahuta
kucha
102 Cardinals QTC QT__QTC eka do
tiina
103 Ordinals QTO QT__QTO pahalaa
duusaraa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Punjabi
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
11 Common NN N__NN ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara
xiYlI
10
CopyrightTDIL
ਿਦਲੀ
ਤਾਜਮਿਹਲ
wAjamahila
14 Nloc NST N__NST ਤ ਥਲ ਅਗ
ਿਪਛ
uYwe WaYle
aYge piYCe
2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ
ਜ
mEz wUM uha
iha jo
21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha
22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ
ਖਦ
ApaNA Apa
Kuxa
23 Relative PRL PR__PRL ਜ ਿਜਸ
ਿਜਹਡਾ ਜਦ
jo jisa jihadZA
jaxoz
24 Reciprocal PRC PR__PRC ਆਪਸ Apasa
25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz
kiYWe
26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa
3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha
31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha
32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa
33 Wh-word DMQ DM__DMQ ਕਣ kONa
34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa
4 Verb V V ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
41 Main VM V__VM ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
412 Non-finite VNF V__VM__VNF ਜਿਦਆ
ਆਿਦਆ
jAzxiAz
AuzxiAz
karaxiAz
11
CopyrightTDIL
ਕਰਿਦਆ ਖਾਕ
ਜਾਕ
KAke jAke
413 Infinitive VINF V__VM__VINF ਿਗਆ
ਆਇਆ
ਕਿਰਆ
giAz AiAz
kariAz
414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ
ਮਰਨ
jANoz KANoz
pINoz
maranoz
42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ
ਹਇਆ
hE sI sakiA
hoiA
5 Adjective JJ ਸਹਣਾ ਚਗਾ
ਮਾਡਾ ਕਾਾਾ
sohaNA
caMgA
mAdZA kAA
6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI
7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz
nAla
8 Conjunction CC CC ਅਤ ਿਕਿਕ
ਅਗਰ ਿਕ ਸਗ
awe kiuzki
agara ki sagoz
81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz
82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ
ਤ
kiuzki ki jo
wAz
9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI
91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ
ਨੀ ਜਨਾਬ
ue adZiA nI
janAba
94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa
badZA
95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ
ਵਗਰ
nahIz nA
binAz vagEra
10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ ਇਕ
WodZA
bahuwA kAPI
kuJa iYka
12
CopyrightTDIL
ਪਿਹਲਾ pahilA
101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ
WodZA
bahuwA kAPI
kuJa
102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna
103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ
(ਚਾਹ-) ਚਹ
(pANI-) XANI
(cAha-) cUha
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
13
CopyrightTDIL
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun
43 Auxiliary VAUX V__VAUX
431 Non-finite VNF V_VM_VNF
432 Infinite VINF V_VM_VNF
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-
ordinator
CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
14
CopyrightTDIL
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign
word
RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Tamil
Sl No Category Label Annotation Convention
Examples Remarks
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N paiyan
raajaa
puttakam
11 Common NN N__NN puttakam
kaNNaaTi
paTam
12 Proper NNP N__NNP moohan ravi maalati
13 Nloc NST N__NST meel kiiz mun pin
15
CopyrightTDIL
2 Pronoun PR PR ituatuavan
21 Personal PRP PR__PRP naan nii avaL avarkaL
22 Reflexive PRF PR__PRF taan
23 Relative PRL PR__PRL yaar etu eppootu enkee
24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam
25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum
3 Demonstrative DM DM a- i- e-
31 Deictic DMD DM__DMD anta inta enta
32 Relative DMR DM__DMR enta
33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu
4 Verb V V vizu poo tuunku aaku
41 Main VM V__VM vizu poo tuunku ciri
411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL
412 Non-finite VNF V__VM__VNF vizunta poonaal
413 Infinitive VINF V__VM__VINF viza pooka cirikka
414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal
42 Verbal VN V_VN paTippu naTai naTattai ceykai
43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum
5 Adjective JJ iniya periya azakaana
6 Adverb RB veekamaaka viraivaaka
16
CopyrightTDIL
7 Postposition PSP paRRi kuRittu viTa
8 Conjunction CC CC maRRum eenenRaal aanaal
81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu
-um is a co-ordinator which can be added to noun and verb
82 Subordinator CCS CC__CCS enRu ena enpatu enRaal
821 Quotative UT CC__CCS__UT enRu ena
9 Particles RP RP maTTUm kuuTa
91 Default RPD RP__RPD maTTUm kuuTa
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ayyoo teey aamaam
94 Intensifier INTF RP__INTF ati veku mika
95 Negation NEG RP__NEG illai
10 Quantifiers QT QT koncam niRaiya oru mutal
101 General QTF QT__QTF koncam niRaiya
102 Cardinals QTC QT__QTC onRu iraNTu
103 Ordinals QTO QT__QTO mutal iraNTaam
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in script other than the script of the original text
112 Symbol SYM RD__SYM $ amp ( ) ruu
For symbols such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
3
CopyrightTDIL
1 INTRODUCTION
Parts of Speech tagging is one the key building blocks (noun pronoun verb demonstrative etc) for developing Natural Language Processing applications This POS schema is based on W3C XML Internalization best practices ISO 639-3 Language Codes for Language Identification ISO 126201999 as metadata definition and one to one mapping table for all the labels used in POS Schema
This document sets out the structural part of the XML Schema definition language and also how to make XML POS Schema for tagging XML Schemas including an introduction to the nature of XML Schemas and an introduction to the XML POS Schema abstract data model along with other terminology used throughout this document and also specifies the precise semantics of each component of the abstract model the representation of each component in XML This document contains block diagram that shows the flow-chart of creating XML scheme for POS tagging It also includes the algorithm that contains metadata as per ISO 126201999
2 SCOPE
The common unified XML based POS Schema for Indian Languages based on W3C Internationalization best practices have been formulated The schema has been developed to take into account the NLP requirements for Web based services in Indian Languages This standard specifies XML POS Schema for tagging This portion of the XML Schema Language discusses labels that can be used in an XML POS Schema
3 TERMINOLOGY
31 POS Tag A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word
32 XML Schema XML Schemas express shared vocabularies and allow machines to
carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema
33 Metadata Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted
4
CopyrightTDIL
4 WHAT IS A POS TAG
A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word Parts of speech include nouns verbs adverbs adjectives pronouns conjunction and their sub-categories
The input to a tagging algorithm is a string of words of a natural language sentence and a specified tag set (a finite list of Part-of-speech tags) The output is a single best POS tag for each word
5 REQUIREMENT OF A POS TAG
The POS tagger can be used as a pre-processor Text indexing and retrieval uses POS information POS tagger is used for making tagged corpora and Machine Translation System Speech processing uses POS tags to decide the pronunciation POS tagger would be needed to identify the tag for the words that could not be analysed by the morphological analyser If the Morph gives multiple tags for a word then the tagger could be used to resolve the ambiguity
51 NEED OF XML SCHEMA IN DESIGNING COMMON POS FORMAT
The need of XML for creating POS tag-set is to standardize the POS tag framework for all Indian languages The main benefits of xml in using POS tag set for ILrsquos are bull It Supports multilingual documents and Unicode bull XML allows developers to add extra information to a format without breaking
applications bull XML documents can be stored without using database administrator because they
contain meta data in the form of tags and attributes bull The tree structure of XML documents allows documents to be compared and
aggregated efficiently element by element bull XML documents can consist of nested elements that are distributed over multiple
remote servers It is easier to convert data between different data types
5
CopyrightTDIL
6 POS Tag set for Indian Languages
POS Categories and Labels
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Verbal NNV N__NNV The verbal noun
sub type is only
for languages
such as Tamil and
Malayalam)
14 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
26 INDEFINITE PRI PR__PRI
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
34 Indefinite DMI DM__DMI
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal VN V__VN paTittam
6
CopyrightTDIL
naTattam naTanam
42 Auxiliary VAUX V__VAUX
421 Finite VAUX V__VAUX__VF
422 Non-finite VNF V__VAUX__VNF
423 Infinitive VINF V__VAUX__VINF
424 Gerund VNG V__VAUX__VNG
425 PARTICIP
LE NOUN
VNP V_VAUX_VNP
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
7
CopyrightTDIL
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
POS for Hindi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N ladakaa raajaa kitaaba
11 Common NN N__NN kitaaba kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST Uupara
niice aage
piiche
2 Pronoun PR PR Yaha vaha
jo
21 Personal PRP PR__PRP Vaha main
tuma ve
22 Reflexive PRF PR__PRF Apanaa
swayam
khuda
23 Relative PRL PR__PRL Jo jis jab
jahaaM
24 Reciprocal PRC PR__PRC Paraspara
aapasa
25 Wh-word PRQ PR__PRQ Kauna kab
kahaaM
Indefinite PRI PR__PRI Koii kis
8
CopyrightTDIL
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD Vaha yaha
32 Relative DMR DM__DMR jo jis
33 Wh-word DMQ DM__DMQ kis kaun
Indefinite DMI DM__DMI KoI kis
4 Verb V V giraa gayaa
sonaa
haMstaa
hai rahaa
41 Main VM V__VM giraa gayaa
sonaa
haMstaa
42 Auxiliary VAUX V__VAUX hai rahaa
huaa
5 Adjective JJ JJ sundara
acchaa
baRaa
6 Adverb RB RB jaldii teza
7 Postposition PSP PSP ne ko se
mein
8 Conjunction CC CC aur agar
tathaa
kyonki
81 Co-ordinator CCD CC__CCD aur balki
parantu
82 Subordinator CCS CC__CCS Agar
kyonki to
ki
9 Particles RP RP to bhii hii
91 Default RPD RP__RPD tobhii hii
93 Interjection INJ RP__INJ are he o
94 Intensifier INTF RP__INTF bahuta
behada
95 Negation NEG RP__NEG nahiin
mata binaa
10 Quantifiers QT QT thoRaa
bahuta
kucha eka
pahalaa
9
CopyrightTDIL
101 General QTF QT__QTF thoRaa
bahuta
kucha
102 Cardinals QTC QT__QTC eka do
tiina
103 Ordinals QTO QT__QTO pahalaa
duusaraa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Punjabi
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
11 Common NN N__NN ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara
xiYlI
10
CopyrightTDIL
ਿਦਲੀ
ਤਾਜਮਿਹਲ
wAjamahila
14 Nloc NST N__NST ਤ ਥਲ ਅਗ
ਿਪਛ
uYwe WaYle
aYge piYCe
2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ
ਜ
mEz wUM uha
iha jo
21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha
22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ
ਖਦ
ApaNA Apa
Kuxa
23 Relative PRL PR__PRL ਜ ਿਜਸ
ਿਜਹਡਾ ਜਦ
jo jisa jihadZA
jaxoz
24 Reciprocal PRC PR__PRC ਆਪਸ Apasa
25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz
kiYWe
26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa
3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha
31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha
32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa
33 Wh-word DMQ DM__DMQ ਕਣ kONa
34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa
4 Verb V V ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
41 Main VM V__VM ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
412 Non-finite VNF V__VM__VNF ਜਿਦਆ
ਆਿਦਆ
jAzxiAz
AuzxiAz
karaxiAz
11
CopyrightTDIL
ਕਰਿਦਆ ਖਾਕ
ਜਾਕ
KAke jAke
413 Infinitive VINF V__VM__VINF ਿਗਆ
ਆਇਆ
ਕਿਰਆ
giAz AiAz
kariAz
414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ
ਮਰਨ
jANoz KANoz
pINoz
maranoz
42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ
ਹਇਆ
hE sI sakiA
hoiA
5 Adjective JJ ਸਹਣਾ ਚਗਾ
ਮਾਡਾ ਕਾਾਾ
sohaNA
caMgA
mAdZA kAA
6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI
7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz
nAla
8 Conjunction CC CC ਅਤ ਿਕਿਕ
ਅਗਰ ਿਕ ਸਗ
awe kiuzki
agara ki sagoz
81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz
82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ
ਤ
kiuzki ki jo
wAz
9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI
91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ
ਨੀ ਜਨਾਬ
ue adZiA nI
janAba
94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa
badZA
95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ
ਵਗਰ
nahIz nA
binAz vagEra
10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ ਇਕ
WodZA
bahuwA kAPI
kuJa iYka
12
CopyrightTDIL
ਪਿਹਲਾ pahilA
101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ
WodZA
bahuwA kAPI
kuJa
102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna
103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ
(ਚਾਹ-) ਚਹ
(pANI-) XANI
(cAha-) cUha
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
13
CopyrightTDIL
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun
43 Auxiliary VAUX V__VAUX
431 Non-finite VNF V_VM_VNF
432 Infinite VINF V_VM_VNF
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-
ordinator
CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
14
CopyrightTDIL
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign
word
RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Tamil
Sl No Category Label Annotation Convention
Examples Remarks
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N paiyan
raajaa
puttakam
11 Common NN N__NN puttakam
kaNNaaTi
paTam
12 Proper NNP N__NNP moohan ravi maalati
13 Nloc NST N__NST meel kiiz mun pin
15
CopyrightTDIL
2 Pronoun PR PR ituatuavan
21 Personal PRP PR__PRP naan nii avaL avarkaL
22 Reflexive PRF PR__PRF taan
23 Relative PRL PR__PRL yaar etu eppootu enkee
24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam
25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum
3 Demonstrative DM DM a- i- e-
31 Deictic DMD DM__DMD anta inta enta
32 Relative DMR DM__DMR enta
33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu
4 Verb V V vizu poo tuunku aaku
41 Main VM V__VM vizu poo tuunku ciri
411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL
412 Non-finite VNF V__VM__VNF vizunta poonaal
413 Infinitive VINF V__VM__VINF viza pooka cirikka
414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal
42 Verbal VN V_VN paTippu naTai naTattai ceykai
43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum
5 Adjective JJ iniya periya azakaana
6 Adverb RB veekamaaka viraivaaka
16
CopyrightTDIL
7 Postposition PSP paRRi kuRittu viTa
8 Conjunction CC CC maRRum eenenRaal aanaal
81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu
-um is a co-ordinator which can be added to noun and verb
82 Subordinator CCS CC__CCS enRu ena enpatu enRaal
821 Quotative UT CC__CCS__UT enRu ena
9 Particles RP RP maTTUm kuuTa
91 Default RPD RP__RPD maTTUm kuuTa
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ayyoo teey aamaam
94 Intensifier INTF RP__INTF ati veku mika
95 Negation NEG RP__NEG illai
10 Quantifiers QT QT koncam niRaiya oru mutal
101 General QTF QT__QTF koncam niRaiya
102 Cardinals QTC QT__QTC onRu iraNTu
103 Ordinals QTO QT__QTO mutal iraNTaam
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in script other than the script of the original text
112 Symbol SYM RD__SYM $ amp ( ) ruu
For symbols such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
4
CopyrightTDIL
4 WHAT IS A POS TAG
A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word Parts of speech include nouns verbs adverbs adjectives pronouns conjunction and their sub-categories
The input to a tagging algorithm is a string of words of a natural language sentence and a specified tag set (a finite list of Part-of-speech tags) The output is a single best POS tag for each word
5 REQUIREMENT OF A POS TAG
The POS tagger can be used as a pre-processor Text indexing and retrieval uses POS information POS tagger is used for making tagged corpora and Machine Translation System Speech processing uses POS tags to decide the pronunciation POS tagger would be needed to identify the tag for the words that could not be analysed by the morphological analyser If the Morph gives multiple tags for a word then the tagger could be used to resolve the ambiguity
51 NEED OF XML SCHEMA IN DESIGNING COMMON POS FORMAT
The need of XML for creating POS tag-set is to standardize the POS tag framework for all Indian languages The main benefits of xml in using POS tag set for ILrsquos are bull It Supports multilingual documents and Unicode bull XML allows developers to add extra information to a format without breaking
applications bull XML documents can be stored without using database administrator because they
contain meta data in the form of tags and attributes bull The tree structure of XML documents allows documents to be compared and
aggregated efficiently element by element bull XML documents can consist of nested elements that are distributed over multiple
remote servers It is easier to convert data between different data types
5
CopyrightTDIL
6 POS Tag set for Indian Languages
POS Categories and Labels
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Verbal NNV N__NNV The verbal noun
sub type is only
for languages
such as Tamil and
Malayalam)
14 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
26 INDEFINITE PRI PR__PRI
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
34 Indefinite DMI DM__DMI
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal VN V__VN paTittam
6
CopyrightTDIL
naTattam naTanam
42 Auxiliary VAUX V__VAUX
421 Finite VAUX V__VAUX__VF
422 Non-finite VNF V__VAUX__VNF
423 Infinitive VINF V__VAUX__VINF
424 Gerund VNG V__VAUX__VNG
425 PARTICIP
LE NOUN
VNP V_VAUX_VNP
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
7
CopyrightTDIL
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
POS for Hindi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N ladakaa raajaa kitaaba
11 Common NN N__NN kitaaba kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST Uupara
niice aage
piiche
2 Pronoun PR PR Yaha vaha
jo
21 Personal PRP PR__PRP Vaha main
tuma ve
22 Reflexive PRF PR__PRF Apanaa
swayam
khuda
23 Relative PRL PR__PRL Jo jis jab
jahaaM
24 Reciprocal PRC PR__PRC Paraspara
aapasa
25 Wh-word PRQ PR__PRQ Kauna kab
kahaaM
Indefinite PRI PR__PRI Koii kis
8
CopyrightTDIL
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD Vaha yaha
32 Relative DMR DM__DMR jo jis
33 Wh-word DMQ DM__DMQ kis kaun
Indefinite DMI DM__DMI KoI kis
4 Verb V V giraa gayaa
sonaa
haMstaa
hai rahaa
41 Main VM V__VM giraa gayaa
sonaa
haMstaa
42 Auxiliary VAUX V__VAUX hai rahaa
huaa
5 Adjective JJ JJ sundara
acchaa
baRaa
6 Adverb RB RB jaldii teza
7 Postposition PSP PSP ne ko se
mein
8 Conjunction CC CC aur agar
tathaa
kyonki
81 Co-ordinator CCD CC__CCD aur balki
parantu
82 Subordinator CCS CC__CCS Agar
kyonki to
ki
9 Particles RP RP to bhii hii
91 Default RPD RP__RPD tobhii hii
93 Interjection INJ RP__INJ are he o
94 Intensifier INTF RP__INTF bahuta
behada
95 Negation NEG RP__NEG nahiin
mata binaa
10 Quantifiers QT QT thoRaa
bahuta
kucha eka
pahalaa
9
CopyrightTDIL
101 General QTF QT__QTF thoRaa
bahuta
kucha
102 Cardinals QTC QT__QTC eka do
tiina
103 Ordinals QTO QT__QTO pahalaa
duusaraa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Punjabi
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
11 Common NN N__NN ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara
xiYlI
10
CopyrightTDIL
ਿਦਲੀ
ਤਾਜਮਿਹਲ
wAjamahila
14 Nloc NST N__NST ਤ ਥਲ ਅਗ
ਿਪਛ
uYwe WaYle
aYge piYCe
2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ
ਜ
mEz wUM uha
iha jo
21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha
22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ
ਖਦ
ApaNA Apa
Kuxa
23 Relative PRL PR__PRL ਜ ਿਜਸ
ਿਜਹਡਾ ਜਦ
jo jisa jihadZA
jaxoz
24 Reciprocal PRC PR__PRC ਆਪਸ Apasa
25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz
kiYWe
26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa
3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha
31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha
32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa
33 Wh-word DMQ DM__DMQ ਕਣ kONa
34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa
4 Verb V V ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
41 Main VM V__VM ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
412 Non-finite VNF V__VM__VNF ਜਿਦਆ
ਆਿਦਆ
jAzxiAz
AuzxiAz
karaxiAz
11
CopyrightTDIL
ਕਰਿਦਆ ਖਾਕ
ਜਾਕ
KAke jAke
413 Infinitive VINF V__VM__VINF ਿਗਆ
ਆਇਆ
ਕਿਰਆ
giAz AiAz
kariAz
414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ
ਮਰਨ
jANoz KANoz
pINoz
maranoz
42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ
ਹਇਆ
hE sI sakiA
hoiA
5 Adjective JJ ਸਹਣਾ ਚਗਾ
ਮਾਡਾ ਕਾਾਾ
sohaNA
caMgA
mAdZA kAA
6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI
7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz
nAla
8 Conjunction CC CC ਅਤ ਿਕਿਕ
ਅਗਰ ਿਕ ਸਗ
awe kiuzki
agara ki sagoz
81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz
82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ
ਤ
kiuzki ki jo
wAz
9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI
91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ
ਨੀ ਜਨਾਬ
ue adZiA nI
janAba
94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa
badZA
95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ
ਵਗਰ
nahIz nA
binAz vagEra
10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ ਇਕ
WodZA
bahuwA kAPI
kuJa iYka
12
CopyrightTDIL
ਪਿਹਲਾ pahilA
101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ
WodZA
bahuwA kAPI
kuJa
102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna
103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ
(ਚਾਹ-) ਚਹ
(pANI-) XANI
(cAha-) cUha
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
13
CopyrightTDIL
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun
43 Auxiliary VAUX V__VAUX
431 Non-finite VNF V_VM_VNF
432 Infinite VINF V_VM_VNF
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-
ordinator
CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
14
CopyrightTDIL
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign
word
RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Tamil
Sl No Category Label Annotation Convention
Examples Remarks
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N paiyan
raajaa
puttakam
11 Common NN N__NN puttakam
kaNNaaTi
paTam
12 Proper NNP N__NNP moohan ravi maalati
13 Nloc NST N__NST meel kiiz mun pin
15
CopyrightTDIL
2 Pronoun PR PR ituatuavan
21 Personal PRP PR__PRP naan nii avaL avarkaL
22 Reflexive PRF PR__PRF taan
23 Relative PRL PR__PRL yaar etu eppootu enkee
24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam
25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum
3 Demonstrative DM DM a- i- e-
31 Deictic DMD DM__DMD anta inta enta
32 Relative DMR DM__DMR enta
33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu
4 Verb V V vizu poo tuunku aaku
41 Main VM V__VM vizu poo tuunku ciri
411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL
412 Non-finite VNF V__VM__VNF vizunta poonaal
413 Infinitive VINF V__VM__VINF viza pooka cirikka
414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal
42 Verbal VN V_VN paTippu naTai naTattai ceykai
43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum
5 Adjective JJ iniya periya azakaana
6 Adverb RB veekamaaka viraivaaka
16
CopyrightTDIL
7 Postposition PSP paRRi kuRittu viTa
8 Conjunction CC CC maRRum eenenRaal aanaal
81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu
-um is a co-ordinator which can be added to noun and verb
82 Subordinator CCS CC__CCS enRu ena enpatu enRaal
821 Quotative UT CC__CCS__UT enRu ena
9 Particles RP RP maTTUm kuuTa
91 Default RPD RP__RPD maTTUm kuuTa
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ayyoo teey aamaam
94 Intensifier INTF RP__INTF ati veku mika
95 Negation NEG RP__NEG illai
10 Quantifiers QT QT koncam niRaiya oru mutal
101 General QTF QT__QTF koncam niRaiya
102 Cardinals QTC QT__QTC onRu iraNTu
103 Ordinals QTO QT__QTO mutal iraNTaam
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in script other than the script of the original text
112 Symbol SYM RD__SYM $ amp ( ) ruu
For symbols such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
5
CopyrightTDIL
6 POS Tag set for Indian Languages
POS Categories and Labels
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Verbal NNV N__NNV The verbal noun
sub type is only
for languages
such as Tamil and
Malayalam)
14 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
26 INDEFINITE PRI PR__PRI
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
34 Indefinite DMI DM__DMI
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal VN V__VN paTittam
6
CopyrightTDIL
naTattam naTanam
42 Auxiliary VAUX V__VAUX
421 Finite VAUX V__VAUX__VF
422 Non-finite VNF V__VAUX__VNF
423 Infinitive VINF V__VAUX__VINF
424 Gerund VNG V__VAUX__VNG
425 PARTICIP
LE NOUN
VNP V_VAUX_VNP
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
7
CopyrightTDIL
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
POS for Hindi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N ladakaa raajaa kitaaba
11 Common NN N__NN kitaaba kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST Uupara
niice aage
piiche
2 Pronoun PR PR Yaha vaha
jo
21 Personal PRP PR__PRP Vaha main
tuma ve
22 Reflexive PRF PR__PRF Apanaa
swayam
khuda
23 Relative PRL PR__PRL Jo jis jab
jahaaM
24 Reciprocal PRC PR__PRC Paraspara
aapasa
25 Wh-word PRQ PR__PRQ Kauna kab
kahaaM
Indefinite PRI PR__PRI Koii kis
8
CopyrightTDIL
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD Vaha yaha
32 Relative DMR DM__DMR jo jis
33 Wh-word DMQ DM__DMQ kis kaun
Indefinite DMI DM__DMI KoI kis
4 Verb V V giraa gayaa
sonaa
haMstaa
hai rahaa
41 Main VM V__VM giraa gayaa
sonaa
haMstaa
42 Auxiliary VAUX V__VAUX hai rahaa
huaa
5 Adjective JJ JJ sundara
acchaa
baRaa
6 Adverb RB RB jaldii teza
7 Postposition PSP PSP ne ko se
mein
8 Conjunction CC CC aur agar
tathaa
kyonki
81 Co-ordinator CCD CC__CCD aur balki
parantu
82 Subordinator CCS CC__CCS Agar
kyonki to
ki
9 Particles RP RP to bhii hii
91 Default RPD RP__RPD tobhii hii
93 Interjection INJ RP__INJ are he o
94 Intensifier INTF RP__INTF bahuta
behada
95 Negation NEG RP__NEG nahiin
mata binaa
10 Quantifiers QT QT thoRaa
bahuta
kucha eka
pahalaa
9
CopyrightTDIL
101 General QTF QT__QTF thoRaa
bahuta
kucha
102 Cardinals QTC QT__QTC eka do
tiina
103 Ordinals QTO QT__QTO pahalaa
duusaraa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Punjabi
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
11 Common NN N__NN ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara
xiYlI
10
CopyrightTDIL
ਿਦਲੀ
ਤਾਜਮਿਹਲ
wAjamahila
14 Nloc NST N__NST ਤ ਥਲ ਅਗ
ਿਪਛ
uYwe WaYle
aYge piYCe
2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ
ਜ
mEz wUM uha
iha jo
21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha
22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ
ਖਦ
ApaNA Apa
Kuxa
23 Relative PRL PR__PRL ਜ ਿਜਸ
ਿਜਹਡਾ ਜਦ
jo jisa jihadZA
jaxoz
24 Reciprocal PRC PR__PRC ਆਪਸ Apasa
25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz
kiYWe
26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa
3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha
31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha
32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa
33 Wh-word DMQ DM__DMQ ਕਣ kONa
34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa
4 Verb V V ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
41 Main VM V__VM ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
412 Non-finite VNF V__VM__VNF ਜਿਦਆ
ਆਿਦਆ
jAzxiAz
AuzxiAz
karaxiAz
11
CopyrightTDIL
ਕਰਿਦਆ ਖਾਕ
ਜਾਕ
KAke jAke
413 Infinitive VINF V__VM__VINF ਿਗਆ
ਆਇਆ
ਕਿਰਆ
giAz AiAz
kariAz
414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ
ਮਰਨ
jANoz KANoz
pINoz
maranoz
42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ
ਹਇਆ
hE sI sakiA
hoiA
5 Adjective JJ ਸਹਣਾ ਚਗਾ
ਮਾਡਾ ਕਾਾਾ
sohaNA
caMgA
mAdZA kAA
6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI
7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz
nAla
8 Conjunction CC CC ਅਤ ਿਕਿਕ
ਅਗਰ ਿਕ ਸਗ
awe kiuzki
agara ki sagoz
81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz
82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ
ਤ
kiuzki ki jo
wAz
9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI
91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ
ਨੀ ਜਨਾਬ
ue adZiA nI
janAba
94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa
badZA
95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ
ਵਗਰ
nahIz nA
binAz vagEra
10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ ਇਕ
WodZA
bahuwA kAPI
kuJa iYka
12
CopyrightTDIL
ਪਿਹਲਾ pahilA
101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ
WodZA
bahuwA kAPI
kuJa
102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna
103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ
(ਚਾਹ-) ਚਹ
(pANI-) XANI
(cAha-) cUha
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
13
CopyrightTDIL
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun
43 Auxiliary VAUX V__VAUX
431 Non-finite VNF V_VM_VNF
432 Infinite VINF V_VM_VNF
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-
ordinator
CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
14
CopyrightTDIL
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign
word
RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Tamil
Sl No Category Label Annotation Convention
Examples Remarks
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N paiyan
raajaa
puttakam
11 Common NN N__NN puttakam
kaNNaaTi
paTam
12 Proper NNP N__NNP moohan ravi maalati
13 Nloc NST N__NST meel kiiz mun pin
15
CopyrightTDIL
2 Pronoun PR PR ituatuavan
21 Personal PRP PR__PRP naan nii avaL avarkaL
22 Reflexive PRF PR__PRF taan
23 Relative PRL PR__PRL yaar etu eppootu enkee
24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam
25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum
3 Demonstrative DM DM a- i- e-
31 Deictic DMD DM__DMD anta inta enta
32 Relative DMR DM__DMR enta
33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu
4 Verb V V vizu poo tuunku aaku
41 Main VM V__VM vizu poo tuunku ciri
411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL
412 Non-finite VNF V__VM__VNF vizunta poonaal
413 Infinitive VINF V__VM__VINF viza pooka cirikka
414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal
42 Verbal VN V_VN paTippu naTai naTattai ceykai
43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum
5 Adjective JJ iniya periya azakaana
6 Adverb RB veekamaaka viraivaaka
16
CopyrightTDIL
7 Postposition PSP paRRi kuRittu viTa
8 Conjunction CC CC maRRum eenenRaal aanaal
81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu
-um is a co-ordinator which can be added to noun and verb
82 Subordinator CCS CC__CCS enRu ena enpatu enRaal
821 Quotative UT CC__CCS__UT enRu ena
9 Particles RP RP maTTUm kuuTa
91 Default RPD RP__RPD maTTUm kuuTa
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ayyoo teey aamaam
94 Intensifier INTF RP__INTF ati veku mika
95 Negation NEG RP__NEG illai
10 Quantifiers QT QT koncam niRaiya oru mutal
101 General QTF QT__QTF koncam niRaiya
102 Cardinals QTC QT__QTC onRu iraNTu
103 Ordinals QTO QT__QTO mutal iraNTaam
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in script other than the script of the original text
112 Symbol SYM RD__SYM $ amp ( ) ruu
For symbols such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
6
CopyrightTDIL
naTattam naTanam
42 Auxiliary VAUX V__VAUX
421 Finite VAUX V__VAUX__VF
422 Non-finite VNF V__VAUX__VNF
423 Infinitive VINF V__VAUX__VINF
424 Gerund VNG V__VAUX__VNG
425 PARTICIP
LE NOUN
VNP V_VAUX_VNP
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
7
CopyrightTDIL
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
POS for Hindi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N ladakaa raajaa kitaaba
11 Common NN N__NN kitaaba kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST Uupara
niice aage
piiche
2 Pronoun PR PR Yaha vaha
jo
21 Personal PRP PR__PRP Vaha main
tuma ve
22 Reflexive PRF PR__PRF Apanaa
swayam
khuda
23 Relative PRL PR__PRL Jo jis jab
jahaaM
24 Reciprocal PRC PR__PRC Paraspara
aapasa
25 Wh-word PRQ PR__PRQ Kauna kab
kahaaM
Indefinite PRI PR__PRI Koii kis
8
CopyrightTDIL
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD Vaha yaha
32 Relative DMR DM__DMR jo jis
33 Wh-word DMQ DM__DMQ kis kaun
Indefinite DMI DM__DMI KoI kis
4 Verb V V giraa gayaa
sonaa
haMstaa
hai rahaa
41 Main VM V__VM giraa gayaa
sonaa
haMstaa
42 Auxiliary VAUX V__VAUX hai rahaa
huaa
5 Adjective JJ JJ sundara
acchaa
baRaa
6 Adverb RB RB jaldii teza
7 Postposition PSP PSP ne ko se
mein
8 Conjunction CC CC aur agar
tathaa
kyonki
81 Co-ordinator CCD CC__CCD aur balki
parantu
82 Subordinator CCS CC__CCS Agar
kyonki to
ki
9 Particles RP RP to bhii hii
91 Default RPD RP__RPD tobhii hii
93 Interjection INJ RP__INJ are he o
94 Intensifier INTF RP__INTF bahuta
behada
95 Negation NEG RP__NEG nahiin
mata binaa
10 Quantifiers QT QT thoRaa
bahuta
kucha eka
pahalaa
9
CopyrightTDIL
101 General QTF QT__QTF thoRaa
bahuta
kucha
102 Cardinals QTC QT__QTC eka do
tiina
103 Ordinals QTO QT__QTO pahalaa
duusaraa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Punjabi
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
11 Common NN N__NN ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara
xiYlI
10
CopyrightTDIL
ਿਦਲੀ
ਤਾਜਮਿਹਲ
wAjamahila
14 Nloc NST N__NST ਤ ਥਲ ਅਗ
ਿਪਛ
uYwe WaYle
aYge piYCe
2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ
ਜ
mEz wUM uha
iha jo
21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha
22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ
ਖਦ
ApaNA Apa
Kuxa
23 Relative PRL PR__PRL ਜ ਿਜਸ
ਿਜਹਡਾ ਜਦ
jo jisa jihadZA
jaxoz
24 Reciprocal PRC PR__PRC ਆਪਸ Apasa
25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz
kiYWe
26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa
3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha
31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha
32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa
33 Wh-word DMQ DM__DMQ ਕਣ kONa
34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa
4 Verb V V ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
41 Main VM V__VM ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
412 Non-finite VNF V__VM__VNF ਜਿਦਆ
ਆਿਦਆ
jAzxiAz
AuzxiAz
karaxiAz
11
CopyrightTDIL
ਕਰਿਦਆ ਖਾਕ
ਜਾਕ
KAke jAke
413 Infinitive VINF V__VM__VINF ਿਗਆ
ਆਇਆ
ਕਿਰਆ
giAz AiAz
kariAz
414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ
ਮਰਨ
jANoz KANoz
pINoz
maranoz
42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ
ਹਇਆ
hE sI sakiA
hoiA
5 Adjective JJ ਸਹਣਾ ਚਗਾ
ਮਾਡਾ ਕਾਾਾ
sohaNA
caMgA
mAdZA kAA
6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI
7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz
nAla
8 Conjunction CC CC ਅਤ ਿਕਿਕ
ਅਗਰ ਿਕ ਸਗ
awe kiuzki
agara ki sagoz
81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz
82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ
ਤ
kiuzki ki jo
wAz
9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI
91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ
ਨੀ ਜਨਾਬ
ue adZiA nI
janAba
94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa
badZA
95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ
ਵਗਰ
nahIz nA
binAz vagEra
10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ ਇਕ
WodZA
bahuwA kAPI
kuJa iYka
12
CopyrightTDIL
ਪਿਹਲਾ pahilA
101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ
WodZA
bahuwA kAPI
kuJa
102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna
103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ
(ਚਾਹ-) ਚਹ
(pANI-) XANI
(cAha-) cUha
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
13
CopyrightTDIL
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun
43 Auxiliary VAUX V__VAUX
431 Non-finite VNF V_VM_VNF
432 Infinite VINF V_VM_VNF
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-
ordinator
CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
14
CopyrightTDIL
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign
word
RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Tamil
Sl No Category Label Annotation Convention
Examples Remarks
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N paiyan
raajaa
puttakam
11 Common NN N__NN puttakam
kaNNaaTi
paTam
12 Proper NNP N__NNP moohan ravi maalati
13 Nloc NST N__NST meel kiiz mun pin
15
CopyrightTDIL
2 Pronoun PR PR ituatuavan
21 Personal PRP PR__PRP naan nii avaL avarkaL
22 Reflexive PRF PR__PRF taan
23 Relative PRL PR__PRL yaar etu eppootu enkee
24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam
25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum
3 Demonstrative DM DM a- i- e-
31 Deictic DMD DM__DMD anta inta enta
32 Relative DMR DM__DMR enta
33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu
4 Verb V V vizu poo tuunku aaku
41 Main VM V__VM vizu poo tuunku ciri
411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL
412 Non-finite VNF V__VM__VNF vizunta poonaal
413 Infinitive VINF V__VM__VINF viza pooka cirikka
414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal
42 Verbal VN V_VN paTippu naTai naTattai ceykai
43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum
5 Adjective JJ iniya periya azakaana
6 Adverb RB veekamaaka viraivaaka
16
CopyrightTDIL
7 Postposition PSP paRRi kuRittu viTa
8 Conjunction CC CC maRRum eenenRaal aanaal
81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu
-um is a co-ordinator which can be added to noun and verb
82 Subordinator CCS CC__CCS enRu ena enpatu enRaal
821 Quotative UT CC__CCS__UT enRu ena
9 Particles RP RP maTTUm kuuTa
91 Default RPD RP__RPD maTTUm kuuTa
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ayyoo teey aamaam
94 Intensifier INTF RP__INTF ati veku mika
95 Negation NEG RP__NEG illai
10 Quantifiers QT QT koncam niRaiya oru mutal
101 General QTF QT__QTF koncam niRaiya
102 Cardinals QTC QT__QTC onRu iraNTu
103 Ordinals QTO QT__QTO mutal iraNTaam
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in script other than the script of the original text
112 Symbol SYM RD__SYM $ amp ( ) ruu
For symbols such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
7
CopyrightTDIL
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
POS for Hindi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N ladakaa raajaa kitaaba
11 Common NN N__NN kitaaba kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST Uupara
niice aage
piiche
2 Pronoun PR PR Yaha vaha
jo
21 Personal PRP PR__PRP Vaha main
tuma ve
22 Reflexive PRF PR__PRF Apanaa
swayam
khuda
23 Relative PRL PR__PRL Jo jis jab
jahaaM
24 Reciprocal PRC PR__PRC Paraspara
aapasa
25 Wh-word PRQ PR__PRQ Kauna kab
kahaaM
Indefinite PRI PR__PRI Koii kis
8
CopyrightTDIL
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD Vaha yaha
32 Relative DMR DM__DMR jo jis
33 Wh-word DMQ DM__DMQ kis kaun
Indefinite DMI DM__DMI KoI kis
4 Verb V V giraa gayaa
sonaa
haMstaa
hai rahaa
41 Main VM V__VM giraa gayaa
sonaa
haMstaa
42 Auxiliary VAUX V__VAUX hai rahaa
huaa
5 Adjective JJ JJ sundara
acchaa
baRaa
6 Adverb RB RB jaldii teza
7 Postposition PSP PSP ne ko se
mein
8 Conjunction CC CC aur agar
tathaa
kyonki
81 Co-ordinator CCD CC__CCD aur balki
parantu
82 Subordinator CCS CC__CCS Agar
kyonki to
ki
9 Particles RP RP to bhii hii
91 Default RPD RP__RPD tobhii hii
93 Interjection INJ RP__INJ are he o
94 Intensifier INTF RP__INTF bahuta
behada
95 Negation NEG RP__NEG nahiin
mata binaa
10 Quantifiers QT QT thoRaa
bahuta
kucha eka
pahalaa
9
CopyrightTDIL
101 General QTF QT__QTF thoRaa
bahuta
kucha
102 Cardinals QTC QT__QTC eka do
tiina
103 Ordinals QTO QT__QTO pahalaa
duusaraa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Punjabi
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
11 Common NN N__NN ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara
xiYlI
10
CopyrightTDIL
ਿਦਲੀ
ਤਾਜਮਿਹਲ
wAjamahila
14 Nloc NST N__NST ਤ ਥਲ ਅਗ
ਿਪਛ
uYwe WaYle
aYge piYCe
2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ
ਜ
mEz wUM uha
iha jo
21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha
22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ
ਖਦ
ApaNA Apa
Kuxa
23 Relative PRL PR__PRL ਜ ਿਜਸ
ਿਜਹਡਾ ਜਦ
jo jisa jihadZA
jaxoz
24 Reciprocal PRC PR__PRC ਆਪਸ Apasa
25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz
kiYWe
26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa
3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha
31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha
32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa
33 Wh-word DMQ DM__DMQ ਕਣ kONa
34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa
4 Verb V V ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
41 Main VM V__VM ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
412 Non-finite VNF V__VM__VNF ਜਿਦਆ
ਆਿਦਆ
jAzxiAz
AuzxiAz
karaxiAz
11
CopyrightTDIL
ਕਰਿਦਆ ਖਾਕ
ਜਾਕ
KAke jAke
413 Infinitive VINF V__VM__VINF ਿਗਆ
ਆਇਆ
ਕਿਰਆ
giAz AiAz
kariAz
414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ
ਮਰਨ
jANoz KANoz
pINoz
maranoz
42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ
ਹਇਆ
hE sI sakiA
hoiA
5 Adjective JJ ਸਹਣਾ ਚਗਾ
ਮਾਡਾ ਕਾਾਾ
sohaNA
caMgA
mAdZA kAA
6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI
7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz
nAla
8 Conjunction CC CC ਅਤ ਿਕਿਕ
ਅਗਰ ਿਕ ਸਗ
awe kiuzki
agara ki sagoz
81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz
82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ
ਤ
kiuzki ki jo
wAz
9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI
91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ
ਨੀ ਜਨਾਬ
ue adZiA nI
janAba
94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa
badZA
95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ
ਵਗਰ
nahIz nA
binAz vagEra
10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ ਇਕ
WodZA
bahuwA kAPI
kuJa iYka
12
CopyrightTDIL
ਪਿਹਲਾ pahilA
101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ
WodZA
bahuwA kAPI
kuJa
102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna
103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ
(ਚਾਹ-) ਚਹ
(pANI-) XANI
(cAha-) cUha
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
13
CopyrightTDIL
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun
43 Auxiliary VAUX V__VAUX
431 Non-finite VNF V_VM_VNF
432 Infinite VINF V_VM_VNF
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-
ordinator
CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
14
CopyrightTDIL
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign
word
RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Tamil
Sl No Category Label Annotation Convention
Examples Remarks
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N paiyan
raajaa
puttakam
11 Common NN N__NN puttakam
kaNNaaTi
paTam
12 Proper NNP N__NNP moohan ravi maalati
13 Nloc NST N__NST meel kiiz mun pin
15
CopyrightTDIL
2 Pronoun PR PR ituatuavan
21 Personal PRP PR__PRP naan nii avaL avarkaL
22 Reflexive PRF PR__PRF taan
23 Relative PRL PR__PRL yaar etu eppootu enkee
24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam
25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum
3 Demonstrative DM DM a- i- e-
31 Deictic DMD DM__DMD anta inta enta
32 Relative DMR DM__DMR enta
33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu
4 Verb V V vizu poo tuunku aaku
41 Main VM V__VM vizu poo tuunku ciri
411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL
412 Non-finite VNF V__VM__VNF vizunta poonaal
413 Infinitive VINF V__VM__VINF viza pooka cirikka
414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal
42 Verbal VN V_VN paTippu naTai naTattai ceykai
43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum
5 Adjective JJ iniya periya azakaana
6 Adverb RB veekamaaka viraivaaka
16
CopyrightTDIL
7 Postposition PSP paRRi kuRittu viTa
8 Conjunction CC CC maRRum eenenRaal aanaal
81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu
-um is a co-ordinator which can be added to noun and verb
82 Subordinator CCS CC__CCS enRu ena enpatu enRaal
821 Quotative UT CC__CCS__UT enRu ena
9 Particles RP RP maTTUm kuuTa
91 Default RPD RP__RPD maTTUm kuuTa
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ayyoo teey aamaam
94 Intensifier INTF RP__INTF ati veku mika
95 Negation NEG RP__NEG illai
10 Quantifiers QT QT koncam niRaiya oru mutal
101 General QTF QT__QTF koncam niRaiya
102 Cardinals QTC QT__QTC onRu iraNTu
103 Ordinals QTO QT__QTO mutal iraNTaam
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in script other than the script of the original text
112 Symbol SYM RD__SYM $ amp ( ) ruu
For symbols such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
8
CopyrightTDIL
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD Vaha yaha
32 Relative DMR DM__DMR jo jis
33 Wh-word DMQ DM__DMQ kis kaun
Indefinite DMI DM__DMI KoI kis
4 Verb V V giraa gayaa
sonaa
haMstaa
hai rahaa
41 Main VM V__VM giraa gayaa
sonaa
haMstaa
42 Auxiliary VAUX V__VAUX hai rahaa
huaa
5 Adjective JJ JJ sundara
acchaa
baRaa
6 Adverb RB RB jaldii teza
7 Postposition PSP PSP ne ko se
mein
8 Conjunction CC CC aur agar
tathaa
kyonki
81 Co-ordinator CCD CC__CCD aur balki
parantu
82 Subordinator CCS CC__CCS Agar
kyonki to
ki
9 Particles RP RP to bhii hii
91 Default RPD RP__RPD tobhii hii
93 Interjection INJ RP__INJ are he o
94 Intensifier INTF RP__INTF bahuta
behada
95 Negation NEG RP__NEG nahiin
mata binaa
10 Quantifiers QT QT thoRaa
bahuta
kucha eka
pahalaa
9
CopyrightTDIL
101 General QTF QT__QTF thoRaa
bahuta
kucha
102 Cardinals QTC QT__QTC eka do
tiina
103 Ordinals QTO QT__QTO pahalaa
duusaraa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Punjabi
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
11 Common NN N__NN ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara
xiYlI
10
CopyrightTDIL
ਿਦਲੀ
ਤਾਜਮਿਹਲ
wAjamahila
14 Nloc NST N__NST ਤ ਥਲ ਅਗ
ਿਪਛ
uYwe WaYle
aYge piYCe
2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ
ਜ
mEz wUM uha
iha jo
21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha
22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ
ਖਦ
ApaNA Apa
Kuxa
23 Relative PRL PR__PRL ਜ ਿਜਸ
ਿਜਹਡਾ ਜਦ
jo jisa jihadZA
jaxoz
24 Reciprocal PRC PR__PRC ਆਪਸ Apasa
25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz
kiYWe
26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa
3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha
31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha
32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa
33 Wh-word DMQ DM__DMQ ਕਣ kONa
34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa
4 Verb V V ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
41 Main VM V__VM ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
412 Non-finite VNF V__VM__VNF ਜਿਦਆ
ਆਿਦਆ
jAzxiAz
AuzxiAz
karaxiAz
11
CopyrightTDIL
ਕਰਿਦਆ ਖਾਕ
ਜਾਕ
KAke jAke
413 Infinitive VINF V__VM__VINF ਿਗਆ
ਆਇਆ
ਕਿਰਆ
giAz AiAz
kariAz
414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ
ਮਰਨ
jANoz KANoz
pINoz
maranoz
42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ
ਹਇਆ
hE sI sakiA
hoiA
5 Adjective JJ ਸਹਣਾ ਚਗਾ
ਮਾਡਾ ਕਾਾਾ
sohaNA
caMgA
mAdZA kAA
6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI
7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz
nAla
8 Conjunction CC CC ਅਤ ਿਕਿਕ
ਅਗਰ ਿਕ ਸਗ
awe kiuzki
agara ki sagoz
81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz
82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ
ਤ
kiuzki ki jo
wAz
9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI
91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ
ਨੀ ਜਨਾਬ
ue adZiA nI
janAba
94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa
badZA
95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ
ਵਗਰ
nahIz nA
binAz vagEra
10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ ਇਕ
WodZA
bahuwA kAPI
kuJa iYka
12
CopyrightTDIL
ਪਿਹਲਾ pahilA
101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ
WodZA
bahuwA kAPI
kuJa
102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna
103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ
(ਚਾਹ-) ਚਹ
(pANI-) XANI
(cAha-) cUha
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
13
CopyrightTDIL
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun
43 Auxiliary VAUX V__VAUX
431 Non-finite VNF V_VM_VNF
432 Infinite VINF V_VM_VNF
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-
ordinator
CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
14
CopyrightTDIL
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign
word
RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Tamil
Sl No Category Label Annotation Convention
Examples Remarks
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N paiyan
raajaa
puttakam
11 Common NN N__NN puttakam
kaNNaaTi
paTam
12 Proper NNP N__NNP moohan ravi maalati
13 Nloc NST N__NST meel kiiz mun pin
15
CopyrightTDIL
2 Pronoun PR PR ituatuavan
21 Personal PRP PR__PRP naan nii avaL avarkaL
22 Reflexive PRF PR__PRF taan
23 Relative PRL PR__PRL yaar etu eppootu enkee
24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam
25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum
3 Demonstrative DM DM a- i- e-
31 Deictic DMD DM__DMD anta inta enta
32 Relative DMR DM__DMR enta
33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu
4 Verb V V vizu poo tuunku aaku
41 Main VM V__VM vizu poo tuunku ciri
411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL
412 Non-finite VNF V__VM__VNF vizunta poonaal
413 Infinitive VINF V__VM__VINF viza pooka cirikka
414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal
42 Verbal VN V_VN paTippu naTai naTattai ceykai
43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum
5 Adjective JJ iniya periya azakaana
6 Adverb RB veekamaaka viraivaaka
16
CopyrightTDIL
7 Postposition PSP paRRi kuRittu viTa
8 Conjunction CC CC maRRum eenenRaal aanaal
81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu
-um is a co-ordinator which can be added to noun and verb
82 Subordinator CCS CC__CCS enRu ena enpatu enRaal
821 Quotative UT CC__CCS__UT enRu ena
9 Particles RP RP maTTUm kuuTa
91 Default RPD RP__RPD maTTUm kuuTa
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ayyoo teey aamaam
94 Intensifier INTF RP__INTF ati veku mika
95 Negation NEG RP__NEG illai
10 Quantifiers QT QT koncam niRaiya oru mutal
101 General QTF QT__QTF koncam niRaiya
102 Cardinals QTC QT__QTC onRu iraNTu
103 Ordinals QTO QT__QTO mutal iraNTaam
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in script other than the script of the original text
112 Symbol SYM RD__SYM $ amp ( ) ruu
For symbols such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
9
CopyrightTDIL
101 General QTF QT__QTF thoRaa
bahuta
kucha
102 Cardinals QTC QT__QTC eka do
tiina
103 Ordinals QTO QT__QTO pahalaa
duusaraa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Punjabi
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
11 Common NN N__NN ਘਰ ਿਕਤਾਬ
ਕਹਾਣੀ ਸਡਕ
Gara kiwAba kahANI sadZaka
12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara
xiYlI
10
CopyrightTDIL
ਿਦਲੀ
ਤਾਜਮਿਹਲ
wAjamahila
14 Nloc NST N__NST ਤ ਥਲ ਅਗ
ਿਪਛ
uYwe WaYle
aYge piYCe
2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ
ਜ
mEz wUM uha
iha jo
21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha
22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ
ਖਦ
ApaNA Apa
Kuxa
23 Relative PRL PR__PRL ਜ ਿਜਸ
ਿਜਹਡਾ ਜਦ
jo jisa jihadZA
jaxoz
24 Reciprocal PRC PR__PRC ਆਪਸ Apasa
25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz
kiYWe
26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa
3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha
31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha
32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa
33 Wh-word DMQ DM__DMQ ਕਣ kONa
34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa
4 Verb V V ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
41 Main VM V__VM ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
412 Non-finite VNF V__VM__VNF ਜਿਦਆ
ਆਿਦਆ
jAzxiAz
AuzxiAz
karaxiAz
11
CopyrightTDIL
ਕਰਿਦਆ ਖਾਕ
ਜਾਕ
KAke jAke
413 Infinitive VINF V__VM__VINF ਿਗਆ
ਆਇਆ
ਕਿਰਆ
giAz AiAz
kariAz
414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ
ਮਰਨ
jANoz KANoz
pINoz
maranoz
42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ
ਹਇਆ
hE sI sakiA
hoiA
5 Adjective JJ ਸਹਣਾ ਚਗਾ
ਮਾਡਾ ਕਾਾਾ
sohaNA
caMgA
mAdZA kAA
6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI
7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz
nAla
8 Conjunction CC CC ਅਤ ਿਕਿਕ
ਅਗਰ ਿਕ ਸਗ
awe kiuzki
agara ki sagoz
81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz
82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ
ਤ
kiuzki ki jo
wAz
9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI
91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ
ਨੀ ਜਨਾਬ
ue adZiA nI
janAba
94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa
badZA
95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ
ਵਗਰ
nahIz nA
binAz vagEra
10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ ਇਕ
WodZA
bahuwA kAPI
kuJa iYka
12
CopyrightTDIL
ਪਿਹਲਾ pahilA
101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ
WodZA
bahuwA kAPI
kuJa
102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna
103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ
(ਚਾਹ-) ਚਹ
(pANI-) XANI
(cAha-) cUha
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
13
CopyrightTDIL
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun
43 Auxiliary VAUX V__VAUX
431 Non-finite VNF V_VM_VNF
432 Infinite VINF V_VM_VNF
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-
ordinator
CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
14
CopyrightTDIL
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign
word
RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Tamil
Sl No Category Label Annotation Convention
Examples Remarks
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N paiyan
raajaa
puttakam
11 Common NN N__NN puttakam
kaNNaaTi
paTam
12 Proper NNP N__NNP moohan ravi maalati
13 Nloc NST N__NST meel kiiz mun pin
15
CopyrightTDIL
2 Pronoun PR PR ituatuavan
21 Personal PRP PR__PRP naan nii avaL avarkaL
22 Reflexive PRF PR__PRF taan
23 Relative PRL PR__PRL yaar etu eppootu enkee
24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam
25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum
3 Demonstrative DM DM a- i- e-
31 Deictic DMD DM__DMD anta inta enta
32 Relative DMR DM__DMR enta
33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu
4 Verb V V vizu poo tuunku aaku
41 Main VM V__VM vizu poo tuunku ciri
411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL
412 Non-finite VNF V__VM__VNF vizunta poonaal
413 Infinitive VINF V__VM__VINF viza pooka cirikka
414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal
42 Verbal VN V_VN paTippu naTai naTattai ceykai
43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum
5 Adjective JJ iniya periya azakaana
6 Adverb RB veekamaaka viraivaaka
16
CopyrightTDIL
7 Postposition PSP paRRi kuRittu viTa
8 Conjunction CC CC maRRum eenenRaal aanaal
81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu
-um is a co-ordinator which can be added to noun and verb
82 Subordinator CCS CC__CCS enRu ena enpatu enRaal
821 Quotative UT CC__CCS__UT enRu ena
9 Particles RP RP maTTUm kuuTa
91 Default RPD RP__RPD maTTUm kuuTa
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ayyoo teey aamaam
94 Intensifier INTF RP__INTF ati veku mika
95 Negation NEG RP__NEG illai
10 Quantifiers QT QT koncam niRaiya oru mutal
101 General QTF QT__QTF koncam niRaiya
102 Cardinals QTC QT__QTC onRu iraNTu
103 Ordinals QTO QT__QTO mutal iraNTaam
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in script other than the script of the original text
112 Symbol SYM RD__SYM $ amp ( ) ruu
For symbols such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
10
CopyrightTDIL
ਿਦਲੀ
ਤਾਜਮਿਹਲ
wAjamahila
14 Nloc NST N__NST ਤ ਥਲ ਅਗ
ਿਪਛ
uYwe WaYle
aYge piYCe
2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ
ਜ
mEz wUM uha
iha jo
21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha
22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ
ਖਦ
ApaNA Apa
Kuxa
23 Relative PRL PR__PRL ਜ ਿਜਸ
ਿਜਹਡਾ ਜਦ
jo jisa jihadZA
jaxoz
24 Reciprocal PRC PR__PRC ਆਪਸ Apasa
25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz
kiYWe
26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa
3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha
31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha
32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa
33 Wh-word DMQ DM__DMQ ਕਣ kONa
34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa
4 Verb V V ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
41 Main VM V__VM ਆਇਆ ਜਾ
ਕਰਦਾ
ਮਾਰਗਾ
ਰਿਹਦਾ
AiA jA karaxA
mArAzgA
rahiMxA
412 Non-finite VNF V__VM__VNF ਜਿਦਆ
ਆਿਦਆ
jAzxiAz
AuzxiAz
karaxiAz
11
CopyrightTDIL
ਕਰਿਦਆ ਖਾਕ
ਜਾਕ
KAke jAke
413 Infinitive VINF V__VM__VINF ਿਗਆ
ਆਇਆ
ਕਿਰਆ
giAz AiAz
kariAz
414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ
ਮਰਨ
jANoz KANoz
pINoz
maranoz
42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ
ਹਇਆ
hE sI sakiA
hoiA
5 Adjective JJ ਸਹਣਾ ਚਗਾ
ਮਾਡਾ ਕਾਾਾ
sohaNA
caMgA
mAdZA kAA
6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI
7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz
nAla
8 Conjunction CC CC ਅਤ ਿਕਿਕ
ਅਗਰ ਿਕ ਸਗ
awe kiuzki
agara ki sagoz
81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz
82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ
ਤ
kiuzki ki jo
wAz
9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI
91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ
ਨੀ ਜਨਾਬ
ue adZiA nI
janAba
94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa
badZA
95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ
ਵਗਰ
nahIz nA
binAz vagEra
10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ ਇਕ
WodZA
bahuwA kAPI
kuJa iYka
12
CopyrightTDIL
ਪਿਹਲਾ pahilA
101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ
WodZA
bahuwA kAPI
kuJa
102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna
103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ
(ਚਾਹ-) ਚਹ
(pANI-) XANI
(cAha-) cUha
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
13
CopyrightTDIL
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun
43 Auxiliary VAUX V__VAUX
431 Non-finite VNF V_VM_VNF
432 Infinite VINF V_VM_VNF
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-
ordinator
CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
14
CopyrightTDIL
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign
word
RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Tamil
Sl No Category Label Annotation Convention
Examples Remarks
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N paiyan
raajaa
puttakam
11 Common NN N__NN puttakam
kaNNaaTi
paTam
12 Proper NNP N__NNP moohan ravi maalati
13 Nloc NST N__NST meel kiiz mun pin
15
CopyrightTDIL
2 Pronoun PR PR ituatuavan
21 Personal PRP PR__PRP naan nii avaL avarkaL
22 Reflexive PRF PR__PRF taan
23 Relative PRL PR__PRL yaar etu eppootu enkee
24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam
25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum
3 Demonstrative DM DM a- i- e-
31 Deictic DMD DM__DMD anta inta enta
32 Relative DMR DM__DMR enta
33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu
4 Verb V V vizu poo tuunku aaku
41 Main VM V__VM vizu poo tuunku ciri
411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL
412 Non-finite VNF V__VM__VNF vizunta poonaal
413 Infinitive VINF V__VM__VINF viza pooka cirikka
414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal
42 Verbal VN V_VN paTippu naTai naTattai ceykai
43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum
5 Adjective JJ iniya periya azakaana
6 Adverb RB veekamaaka viraivaaka
16
CopyrightTDIL
7 Postposition PSP paRRi kuRittu viTa
8 Conjunction CC CC maRRum eenenRaal aanaal
81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu
-um is a co-ordinator which can be added to noun and verb
82 Subordinator CCS CC__CCS enRu ena enpatu enRaal
821 Quotative UT CC__CCS__UT enRu ena
9 Particles RP RP maTTUm kuuTa
91 Default RPD RP__RPD maTTUm kuuTa
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ayyoo teey aamaam
94 Intensifier INTF RP__INTF ati veku mika
95 Negation NEG RP__NEG illai
10 Quantifiers QT QT koncam niRaiya oru mutal
101 General QTF QT__QTF koncam niRaiya
102 Cardinals QTC QT__QTC onRu iraNTu
103 Ordinals QTO QT__QTO mutal iraNTaam
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in script other than the script of the original text
112 Symbol SYM RD__SYM $ amp ( ) ruu
For symbols such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
11
CopyrightTDIL
ਕਰਿਦਆ ਖਾਕ
ਜਾਕ
KAke jAke
413 Infinitive VINF V__VM__VINF ਿਗਆ
ਆਇਆ
ਕਿਰਆ
giAz AiAz
kariAz
414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ
ਮਰਨ
jANoz KANoz
pINoz
maranoz
42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ
ਹਇਆ
hE sI sakiA
hoiA
5 Adjective JJ ਸਹਣਾ ਚਗਾ
ਮਾਡਾ ਕਾਾਾ
sohaNA
caMgA
mAdZA kAA
6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI
7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz
nAla
8 Conjunction CC CC ਅਤ ਿਕਿਕ
ਅਗਰ ਿਕ ਸਗ
awe kiuzki
agara ki sagoz
81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz
82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ
ਤ
kiuzki ki jo
wAz
9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI
91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ
ਨੀ ਜਨਾਬ
ue adZiA nI
janAba
94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa
badZA
95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ
ਵਗਰ
nahIz nA
binAz vagEra
10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ ਇਕ
WodZA
bahuwA kAPI
kuJa iYka
12
CopyrightTDIL
ਪਿਹਲਾ pahilA
101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ
WodZA
bahuwA kAPI
kuJa
102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna
103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ
(ਚਾਹ-) ਚਹ
(pANI-) XANI
(cAha-) cUha
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
13
CopyrightTDIL
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun
43 Auxiliary VAUX V__VAUX
431 Non-finite VNF V_VM_VNF
432 Infinite VINF V_VM_VNF
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-
ordinator
CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
14
CopyrightTDIL
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign
word
RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Tamil
Sl No Category Label Annotation Convention
Examples Remarks
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N paiyan
raajaa
puttakam
11 Common NN N__NN puttakam
kaNNaaTi
paTam
12 Proper NNP N__NNP moohan ravi maalati
13 Nloc NST N__NST meel kiiz mun pin
15
CopyrightTDIL
2 Pronoun PR PR ituatuavan
21 Personal PRP PR__PRP naan nii avaL avarkaL
22 Reflexive PRF PR__PRF taan
23 Relative PRL PR__PRL yaar etu eppootu enkee
24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam
25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum
3 Demonstrative DM DM a- i- e-
31 Deictic DMD DM__DMD anta inta enta
32 Relative DMR DM__DMR enta
33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu
4 Verb V V vizu poo tuunku aaku
41 Main VM V__VM vizu poo tuunku ciri
411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL
412 Non-finite VNF V__VM__VNF vizunta poonaal
413 Infinitive VINF V__VM__VINF viza pooka cirikka
414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal
42 Verbal VN V_VN paTippu naTai naTattai ceykai
43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum
5 Adjective JJ iniya periya azakaana
6 Adverb RB veekamaaka viraivaaka
16
CopyrightTDIL
7 Postposition PSP paRRi kuRittu viTa
8 Conjunction CC CC maRRum eenenRaal aanaal
81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu
-um is a co-ordinator which can be added to noun and verb
82 Subordinator CCS CC__CCS enRu ena enpatu enRaal
821 Quotative UT CC__CCS__UT enRu ena
9 Particles RP RP maTTUm kuuTa
91 Default RPD RP__RPD maTTUm kuuTa
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ayyoo teey aamaam
94 Intensifier INTF RP__INTF ati veku mika
95 Negation NEG RP__NEG illai
10 Quantifiers QT QT koncam niRaiya oru mutal
101 General QTF QT__QTF koncam niRaiya
102 Cardinals QTC QT__QTC onRu iraNTu
103 Ordinals QTO QT__QTO mutal iraNTaam
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in script other than the script of the original text
112 Symbol SYM RD__SYM $ amp ( ) ruu
For symbols such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
12
CopyrightTDIL
ਪਿਹਲਾ pahilA
101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ
ਕਾਫੀ ਕਝ
WodZA
bahuwA kAPI
kuJa
102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna
103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ
(ਚਾਹ-) ਚਹ
(pANI-) XANI
(cAha-) cUha
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)
Sl No Category Label Annotation
Convention
Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN
12 Proper NNP N__NNP
13 Nloc NST N__NST
2 Pronoun PR PR
21 Personal PRP PR__PRP
22 Reflexive PRF PR__PRF
13
CopyrightTDIL
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun
43 Auxiliary VAUX V__VAUX
431 Non-finite VNF V_VM_VNF
432 Infinite VINF V_VM_VNF
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-
ordinator
CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
14
CopyrightTDIL
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign
word
RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Tamil
Sl No Category Label Annotation Convention
Examples Remarks
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N paiyan
raajaa
puttakam
11 Common NN N__NN puttakam
kaNNaaTi
paTam
12 Proper NNP N__NNP moohan ravi maalati
13 Nloc NST N__NST meel kiiz mun pin
15
CopyrightTDIL
2 Pronoun PR PR ituatuavan
21 Personal PRP PR__PRP naan nii avaL avarkaL
22 Reflexive PRF PR__PRF taan
23 Relative PRL PR__PRL yaar etu eppootu enkee
24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam
25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum
3 Demonstrative DM DM a- i- e-
31 Deictic DMD DM__DMD anta inta enta
32 Relative DMR DM__DMR enta
33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu
4 Verb V V vizu poo tuunku aaku
41 Main VM V__VM vizu poo tuunku ciri
411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL
412 Non-finite VNF V__VM__VNF vizunta poonaal
413 Infinitive VINF V__VM__VINF viza pooka cirikka
414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal
42 Verbal VN V_VN paTippu naTai naTattai ceykai
43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum
5 Adjective JJ iniya periya azakaana
6 Adverb RB veekamaaka viraivaaka
16
CopyrightTDIL
7 Postposition PSP paRRi kuRittu viTa
8 Conjunction CC CC maRRum eenenRaal aanaal
81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu
-um is a co-ordinator which can be added to noun and verb
82 Subordinator CCS CC__CCS enRu ena enpatu enRaal
821 Quotative UT CC__CCS__UT enRu ena
9 Particles RP RP maTTUm kuuTa
91 Default RPD RP__RPD maTTUm kuuTa
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ayyoo teey aamaam
94 Intensifier INTF RP__INTF ati veku mika
95 Negation NEG RP__NEG illai
10 Quantifiers QT QT koncam niRaiya oru mutal
101 General QTF QT__QTF koncam niRaiya
102 Cardinals QTC QT__QTC onRu iraNTu
103 Ordinals QTO QT__QTO mutal iraNTaam
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in script other than the script of the original text
112 Symbol SYM RD__SYM $ amp ( ) ruu
For symbols such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
13
CopyrightTDIL
23 Relative PRL PR__PRL
24 Reciprocal PRC PR__PRC
25 Wh-word PRQ PR__PRQ
3 Demonstrative DM DM
31 Deictic DMD DM__DMD
32 Relative DMR DM__DMR
33 Wh-word DMQ DM__DMQ
4 Verb V V
41 Main VM V__VM
411 Finite VF V__VM__VF
412 Non-finite VNF V__VM__VNF
413 Infinitive VINF V__VM__VINF
414 Gerund VNG V__VM__VNG
42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun
43 Auxiliary VAUX V__VAUX
431 Non-finite VNF V_VM_VNF
432 Infinite VINF V_VM_VNF
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
81 Co-
ordinator
CCD CC__CCD
82 Subordinator CCS CC__CCS
821 Quotative UT CC__CCS__UT
9 Particles RP RP
91 Default RPD RP__RPD
92 Classifier CL RP__CL
93 Interjection INJ RP__INJ
94 Intensifier INTF RP__INTF
14
CopyrightTDIL
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign
word
RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Tamil
Sl No Category Label Annotation Convention
Examples Remarks
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N paiyan
raajaa
puttakam
11 Common NN N__NN puttakam
kaNNaaTi
paTam
12 Proper NNP N__NNP moohan ravi maalati
13 Nloc NST N__NST meel kiiz mun pin
15
CopyrightTDIL
2 Pronoun PR PR ituatuavan
21 Personal PRP PR__PRP naan nii avaL avarkaL
22 Reflexive PRF PR__PRF taan
23 Relative PRL PR__PRL yaar etu eppootu enkee
24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam
25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum
3 Demonstrative DM DM a- i- e-
31 Deictic DMD DM__DMD anta inta enta
32 Relative DMR DM__DMR enta
33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu
4 Verb V V vizu poo tuunku aaku
41 Main VM V__VM vizu poo tuunku ciri
411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL
412 Non-finite VNF V__VM__VNF vizunta poonaal
413 Infinitive VINF V__VM__VINF viza pooka cirikka
414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal
42 Verbal VN V_VN paTippu naTai naTattai ceykai
43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum
5 Adjective JJ iniya periya azakaana
6 Adverb RB veekamaaka viraivaaka
16
CopyrightTDIL
7 Postposition PSP paRRi kuRittu viTa
8 Conjunction CC CC maRRum eenenRaal aanaal
81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu
-um is a co-ordinator which can be added to noun and verb
82 Subordinator CCS CC__CCS enRu ena enpatu enRaal
821 Quotative UT CC__CCS__UT enRu ena
9 Particles RP RP maTTUm kuuTa
91 Default RPD RP__RPD maTTUm kuuTa
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ayyoo teey aamaam
94 Intensifier INTF RP__INTF ati veku mika
95 Negation NEG RP__NEG illai
10 Quantifiers QT QT koncam niRaiya oru mutal
101 General QTF QT__QTF koncam niRaiya
102 Cardinals QTC QT__QTC onRu iraNTu
103 Ordinals QTO QT__QTO mutal iraNTaam
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in script other than the script of the original text
112 Symbol SYM RD__SYM $ amp ( ) ruu
For symbols such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
14
CopyrightTDIL
95 Negation NEG RP__NEG
10 Quantifiers QT QT
101 General QTF QT__QTF
102 Cardinals QTC QT__QTC
103 Ordinals QTO QT__QTO
11 Residuals RD RD
111 Foreign
word
RDF RD__RDF A word written in
script other than
the script of the
original text
112 Symbol SYM RD__SYM For symbols such
as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Tamil
Sl No Category Label Annotation Convention
Examples Remarks
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N paiyan
raajaa
puttakam
11 Common NN N__NN puttakam
kaNNaaTi
paTam
12 Proper NNP N__NNP moohan ravi maalati
13 Nloc NST N__NST meel kiiz mun pin
15
CopyrightTDIL
2 Pronoun PR PR ituatuavan
21 Personal PRP PR__PRP naan nii avaL avarkaL
22 Reflexive PRF PR__PRF taan
23 Relative PRL PR__PRL yaar etu eppootu enkee
24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam
25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum
3 Demonstrative DM DM a- i- e-
31 Deictic DMD DM__DMD anta inta enta
32 Relative DMR DM__DMR enta
33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu
4 Verb V V vizu poo tuunku aaku
41 Main VM V__VM vizu poo tuunku ciri
411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL
412 Non-finite VNF V__VM__VNF vizunta poonaal
413 Infinitive VINF V__VM__VINF viza pooka cirikka
414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal
42 Verbal VN V_VN paTippu naTai naTattai ceykai
43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum
5 Adjective JJ iniya periya azakaana
6 Adverb RB veekamaaka viraivaaka
16
CopyrightTDIL
7 Postposition PSP paRRi kuRittu viTa
8 Conjunction CC CC maRRum eenenRaal aanaal
81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu
-um is a co-ordinator which can be added to noun and verb
82 Subordinator CCS CC__CCS enRu ena enpatu enRaal
821 Quotative UT CC__CCS__UT enRu ena
9 Particles RP RP maTTUm kuuTa
91 Default RPD RP__RPD maTTUm kuuTa
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ayyoo teey aamaam
94 Intensifier INTF RP__INTF ati veku mika
95 Negation NEG RP__NEG illai
10 Quantifiers QT QT koncam niRaiya oru mutal
101 General QTF QT__QTF koncam niRaiya
102 Cardinals QTC QT__QTC onRu iraNTu
103 Ordinals QTO QT__QTO mutal iraNTaam
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in script other than the script of the original text
112 Symbol SYM RD__SYM $ amp ( ) ruu
For symbols such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
15
CopyrightTDIL
2 Pronoun PR PR ituatuavan
21 Personal PRP PR__PRP naan nii avaL avarkaL
22 Reflexive PRF PR__PRF taan
23 Relative PRL PR__PRL yaar etu eppootu enkee
24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam
25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum
3 Demonstrative DM DM a- i- e-
31 Deictic DMD DM__DMD anta inta enta
32 Relative DMR DM__DMR enta
33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu
4 Verb V V vizu poo tuunku aaku
41 Main VM V__VM vizu poo tuunku ciri
411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL
412 Non-finite VNF V__VM__VNF vizunta poonaal
413 Infinitive VINF V__VM__VINF viza pooka cirikka
414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal
42 Verbal VN V_VN paTippu naTai naTattai ceykai
43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum
5 Adjective JJ iniya periya azakaana
6 Adverb RB veekamaaka viraivaaka
16
CopyrightTDIL
7 Postposition PSP paRRi kuRittu viTa
8 Conjunction CC CC maRRum eenenRaal aanaal
81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu
-um is a co-ordinator which can be added to noun and verb
82 Subordinator CCS CC__CCS enRu ena enpatu enRaal
821 Quotative UT CC__CCS__UT enRu ena
9 Particles RP RP maTTUm kuuTa
91 Default RPD RP__RPD maTTUm kuuTa
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ayyoo teey aamaam
94 Intensifier INTF RP__INTF ati veku mika
95 Negation NEG RP__NEG illai
10 Quantifiers QT QT koncam niRaiya oru mutal
101 General QTF QT__QTF koncam niRaiya
102 Cardinals QTC QT__QTC onRu iraNTu
103 Ordinals QTO QT__QTO mutal iraNTaam
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in script other than the script of the original text
112 Symbol SYM RD__SYM $ amp ( ) ruu
For symbols such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
16
CopyrightTDIL
7 Postposition PSP paRRi kuRittu viTa
8 Conjunction CC CC maRRum eenenRaal aanaal
81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu
-um is a co-ordinator which can be added to noun and verb
82 Subordinator CCS CC__CCS enRu ena enpatu enRaal
821 Quotative UT CC__CCS__UT enRu ena
9 Particles RP RP maTTUm kuuTa
91 Default RPD RP__RPD maTTUm kuuTa
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ ayyoo teey aamaam
94 Intensifier INTF RP__INTF ati veku mika
95 Negation NEG RP__NEG illai
10 Quantifiers QT QT koncam niRaiya oru mutal
101 General QTF QT__QTF koncam niRaiya
102 Cardinals QTC QT__QTC onRu iraNTu
103 Ordinals QTO QT__QTO mutal iraNTaam
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written in script other than the script of the original text
112 Symbol SYM RD__SYM $ amp ( ) ruu
For symbols such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
17
CopyrightTDIL
POS for Malyalam
Sl No
Category Label Annotation Convention
Examples Examples in Malayalam
Top level Subtype (level 1)
Subtype (level 2)
1 Noun N N avan
mOhan
vItu
11 Common NN N__NN vItu
vellam
pattam
12 Proper NNP N__NNP mOhan ravi sIta
േമാഹ൯ രവി സീത
13 Nloc NST N__NST mEle tAze munpil pinnil
േമെല താെഴ മനിി ിനിി
2 Pronoun PR PR avanavalatuitu
അവ൯ അവള അത ഇത
21 Personal PRP PR__PRP naan nii avaL avar
ഞാ൯നീ അവള അവ൪
22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯
23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi
l parasparam
തമിിിതമിി
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
18
CopyrightTDIL
രസരം
25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯
3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത
ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത
എങെന 4 Verb V V pO kazhi
Annuciri ോ കഴി ആണി(Cop
ula) ചിരി 41 Main VM V__VM pO kazhi
cirriAnnu(copula)
ോ കഴി ആണി (copula) ചിരി
411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)
ോയി ചിരികം കഴികന ആകന(copula)
412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca
ോയ ചിരിച കഴിച
413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn
ോക ചിരിക കയാി
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
19
CopyrightTDIL
കഴിക വരാ൯വരവാ൯
42 Verbal VN V__VN paTittam naTattam naTanam
ഠിതം നടതം നടനം
43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka
െകാലക തലക കാണക േനാകക
5 Adjective JJ valiya ceRiya azakulla
വലിയ െചറിയ അഴകള
6 Adverb RB veegam ativeegam kUtutal
േവഗം അതിേവഗം കടതി
7 Postposition PSP paRRi kUte റി കെട
8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum
െക എനിനം എനാി എനാ
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
20
CopyrightTDIL
ലം എങിലം
81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe
ഉംി(രാമനം) െക
82 Subordinator CCS CC__CCS ennu enna ennAl
എന എന എനാി
821 Quotative UT CC__CCS__UT ennu enna എന എന
9 Particles RP RP kutemAtram കെട മാതം
91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല
വളെര 95 Negation NEG RP__NEG illa alla ഇല
അല 10 Quantifiers QT QT kuracchu
niraccu oru dharalam
കറച നിറച ഒര ധാരാളം
101 General QTF QT__QTF kuraccu niraccu dharalam
കറച നിറച ധാരാളം
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
21
CopyrightTDIL
102 Cardinals QTC QT__QTC onnurantu ഒന രണ
103 Ordinals QTO QT__QTO onnAmrantam
ഒനാം രണാം
11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )
ruu $ amp ( ) ര
113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH
POS for Bangla
Sl No Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalama cashmaa
12 Proper NNP N__NNP Mohan ravi
rashmi
14 Nloc NST N__NST upare
niche
bhitara
2 Pronoun PR PR
21 Personal PRP PR__PRP se tumi
AmAra
22 Reflexive PRF PR__PRF nijera
23 Relative PRL PR__PRL ye yakhana
yena yAra
24 Reciprocal PRC PR__PRC paraspara
25 Wh-word PRQ PR__PRQ ke kakhana
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
22
CopyrightTDIL
kena kAra
26 Indefinite PRI PR__PRI keu
3 Demonstrative DM DM Vaha jo
yaha
31 Deictic DMD DM__DMD sei oi o se
32 Relative DMR DM__DMR ye yei
33 Wh-word DMQ DM__DMQ kono
34 Indefinite DMI DM__DMI keu
4 Verb V V
41 Main VM V__VM
41
1
Finite VF V__VM__VF karachhilAm
a yAba
khAYa
41
2
Non-finite VNF V__VM__VNF kare
kheYe
karale
khete
41
3
Infinitive VINF V__VM__VINF karate
khete yete
41
4
Gerund VNG V__VM__VNG yAoYa
AsA khelA
karA
42 Auxiliary VAUX V__VAUX chhila
habe chAi
5 Adjective JJ sundara
bhAla lAla
6 Adverb RB tADAtADi
Aste
haThAt
7 Postposition PSP theke
abadhI
madhye
diYe
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD Ara eban
athabA
kimbA
82 Subordinator CCS CC__CCS ye kintu
noile
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
23
CopyrightTDIL
tAhale
82
1
Quotative UT CC__CCS__UT ---- Not required
9 Particles RP RP
91 Default RPD RP__RPD to ye
92 Classifier CL RP__CL jana khAnA
93 Interjection INJ RP__INJ Are ei
hAya
94 Intensifier INTF RP__INTF bhiShaNa
khuba
sA~NghAtik
a
95 Negation NEG RP__NEG nA naYa
chhADA
10 Quantifiers QT QT
101 General QTF QT__QTF kichhu
alpa aneka
102 Cardinals QTC QT__QTC eka dui
tina
103 Ordinals QTO QT__QTO prathama
paYalA
dvitIYa
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH jala Tala
khAbAra
dAbAra
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
24
CopyrightTDIL
POS for Marathi
Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N मलगा (mulagaa-boy)
राजा (raajaa-king)
पसत (pustaka-book)
11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )
12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)
13 Verbal NNV N__NNV NA Not
Required
14 Nloc NST N__NST वर(var- up)
खाल(khaalee-
down)
पढ(pudhe-
ahead)
माग(maage-
back)
Where it is
separate it is
NST
2 Pronoun PR PR यथ(yethe-
here) थ (tethe-there)
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
25
CopyrightTDIL
जो(jo-who)
ो(to-he)
21 Personal PRP PR__PRP ो(to-he)
मी(mee-I)
(tu-you)
(te-they)
मह(tumhi-
you)
22 Reflexive PRF PR__PRF सवत(swatha-
myself)
आपण(aapana-
oursleves)
23 Relative PRL PR__PRL जो(jo-who)
जयान(jyaane-
who)
जवहा(jevhaa-
while)
िजथ(jeethe-
where)
24 Reciprocal PRC PR__PRC परसपर(Parasp
ara-
reciprocally )
एतमत(ekmek
- mutually)
25 Wh-word PRQ PR__PRQ तोण(kona-
who)
तवहा(kevha-
when)
तठ(kuthe-
where)
26 Indefinite तोणी(kona
3 Demonstrative DM DM ो(to-he)
हा(haa-this)
जो(jo-who)
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
26
CopyrightTDIL
31 Deictic DMD DM__DMD इथ(ithe-here)
थ(tithe-
there)
32 Relative DMR DM__DMR जो(jo-who)
जयान(jyane-
who)
33 Wh-word DMQ DM__DMQ तोणा(konta-
which)
तोणी(kona-
who)
4 Verb V V (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41 Main VM V__VM पडला (padalaa-fell
down)
गला(gelaa-
went)
झोपला(jhopala
a-slept)
आह(aahe-is)
41
1
Finite VF V__VM__VF - This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level
41
2
Non-finite VNF V__VM__VNF - --do--
41
3
Infinitive VINF V__VM__VINF - --do--
41 Gerund VNG V__VM__VNG --do--
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
27
CopyrightTDIL
4
42 Auxiliary VAUX V__VAUX आह (is) लागला (started)
5 Adjective JJ सदर(sundara-
beautiful)
चागला(chaang
alaa-good)
मोठा(moThaa-
big)
6 Adverb RB लवतर(lavakar
- fast )
हळहळ(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आण(aaNi-
and)
तारण(kaaraN-
because)
81 Co-ordinator CCD CC__CCD आण(aaNi-
and)
पण(paNa-
but) पर (parantu-but)
82 Subordinator CCS CC__CCS तारण त (kaaraN-
because of)
ता त(kaaraN
kii-because
of) जर-र(jara-tara-
if-then)
82
1
Quotative UT CC__CCS__UT असा महणन
9 Particles RP RP र(tara)
91 Default RPD RP__RPD र(tara) (then)
92 Classifier CL RP__CL Not required
93 Interjection INJ RP__INJ अरर(arere)
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
28
CopyrightTDIL
ओहो(oho-
oh)
94 Intensifier INTF RP__INTF खप(khoop-
lot very )
बराच(baraach-
too much)
अशय(atisha
ya- too much
very)
95 Negation NEG RP__NEG नतो(nako-
not) न(na-
Na)
10 Quantifiers QT QT थोड(thode-
few)
जास(jaasta-
lot)
ताह(kaahi-
few) एत(eka-
one)
पहला(pahilaa-
first)
101 General QTF QT__QTF थोड thoDe-
few)
जास(jaasta-
lot)
ताह(kaahi-
few)
102 Cardinals QTC QT__QTC एत(eka-one)
दोन(dona-two)
103 Ordinals QTO QT__QTO पहला(pahilaa-
first)
दसरा(dusaraa-
second)
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
29
CopyrightTDIL
112 Symbol SYM RD__SYM $ amp ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जवणबवण(jev
anbivaNa-
mealdinner)
डोतबत(Doke
bike- head)
(Paanii-)
vaanii
(khaanaa-)
vaanaa
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN kalamchashmA
lsquopenrsquo lsquospectaclesrsquo
12 Proper NNP N__NNP mohanravI
lsquoMohanrsquo lsquoRavirsquo
13 Nloc NST N__NST upar nIche ahIM
lsquouprsquo lsquodownrsquo lsquoin frontrsquo
2 Pronoun PR PR
21 Personal PRP PR__PRP huMtuMte
lsquomersquo lsquoyoursquo
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
30
CopyrightTDIL
lsquoheshersquo 22 Reflexive PRF PR__PRF pote
jAtesvayam
lsquoherselfhimselfrsquo
23 Relative PRL PR__PRL je te jyAM
lsquowhorsquo lsquowherersquo
24 Reciprocal PRC PR__PRC aras-paras paraspar
lsquomutuallyrsquolsquoeach otherrsquo
25 Wh-word PRQ PR__PRQ koN kyAre kyAM
lsquowhorsquo lsquowhenrsquo lsquowherersquo
26 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
3 Demonstrative DM DM
31 Deictic DMD DM__DMD A
lsquothisrsquo
32 Relative DMR DM__DMR je jeNe
lsquowhichwhorsquo lsquowhomrsquo
33 Wh-word DMQ DM__DMQ koNshuMkem
lsquowhorsquo lsquowhatrsquo lsquowhyrsquo
34 Indefinite koI kaIMK kashuM
lsquosomeonersquo lsquosomethingrsquo
4 Verb V V
41 Main VM V__VM khAshekhAdhu
lsquowill eatrsquo
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
31
CopyrightTDIL
lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk
aryuM
lsquoisrsquo rsquowasrsquo lsquodidrsquo
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD aneke
lsquoandrsquo lsquoorrsquo
82 Subordinator CCS CC__CCS tethI evuM kAraNke
lsquosorsquo lsquolike thatrsquo lsquobecausersquo
9 Particles RP RP
91 Default RPD RP__RPD paNajatO
lsquobutrsquo emph topic
92 Interjection INJ RP__INJ hE arrrE O
93 Intensifier INTF RP__INTF bahughaNuM
lsquoveryrsquo lsquomuchrsquo
94 Negation NEG RP__NEG nahina
lsquonorsquo
10 Quantifiers QT QT
101 General QTF QT__QTF thoduMghaNuM
lsquolittlersquo lsquomuchrsquo
102 Cardinals QTC QT__QTC ekabe traN
lsquoonetwothreersquo
103 Ordinals QTO QT__QTO paheluMbIjI
lsquofirstrsquo(neu)
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
32
CopyrightTDIL
lsquosecondrsquo (fem)
11 Residuals RD RD
111 Foreign word RDF RD__RDF tv perasitemol
112 Symbol SYM RD__SYM $ amp
113 Punctuation PUNC RD__PUNC ()
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH kAm-bAmpANi-bANi
lsquowork and the likersquo water and the likersquo
POS for Konakani Sl
No Category Label Annotation
Convention Examples Remark
s
Top level Subtype
(level 1) Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पसत रख आबो
माड
12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला
13 Nloc NST N__NST भायर भीर वयर सतयल
2 Pronoun PR PR
21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच
22 Reflexive PRF PR__PRF आपण सवा
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
33
CopyrightTDIL
23 Relative PRL PR__PRL जा जो
24 Reciprocal PRC PR__PRC एतामतात आपसा
25 Wh-word PRQ PR__PRQ तोण त खयचो
26 Indefinite तोणय त य खयचय
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ो हो
32 Relative DMR DM__DMR जो
33 Wh-word DMQ DM__DMQ तोण तसल
34 Indefinite तोणाचय तसलय
4 Verb V V
41 Main VM V__VM यवप
411
Finite VF V__VM__VF आयलो आयला आयललो
412
Non-
Finite VNF V__VM__VNF यतच यवन
आयललयान यवत यवपात यवपाच यवच
413
Infinitive VINF V__VM__VINF आस वहर तलयार
414
Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी
42 Auxiliary VAUX V__VAUX NA
42
1 Finite V__VAUX__VF तलल आस आयला
आस
42
2 Non-
Finite V__VAUX__VN
F तरा जाय तरा आसलो यी
5 Adjective JJ सोबी सदर
6 Adverb RB फालया सवतास
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
34
CopyrightTDIL
अश
7 Postposition PSP खाीर पास बगर तडन लागी
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आनी वा
82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन
82
1 Quotative UT CC__CCS__UT अश त
9 Particles RP RP
91 Default RPD RP__RPD बी आद इतयाद
92 Classifier CL RP__CL (पाच) जाण
93 Interjection INJ RP__INJ आर चप
94 Intensifier INTF RP__INTF उपाट भरपर
95 Negation NEG RP__NEG ना नयह
10 Quantifiers QT QT
101 General QTF QT__QTF थोड चड ताय खब
102 Cardinals QTC QT__QTC एत दोन
103 Ordinals QTO QT__QTO पयल दसर
11 Residuals RD RD
111 Foreign word RDF RD__RDF
112 Symbol SYM RD__SYM amp $
113 Punctuation PUNC RD__PUNC -
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जोवण-बवण
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
35
CopyrightTDIL
POS for Maithili Sl
No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun N N
11 Common NN N__NN पोथी तलम
पड खवास
12 Proper NNP N__NNP अरण दनश
अल
13 Nloc NST N__NST आग पीछ
ऊपर नीचा एखन आब
बीच तह
2 Pronoun PR PR
21 Personal PRP PR__PRP हम ई ओ
अहा
22 Reflexive PRF PR__PRF अपना अपन
सवय सवयमव
23 Relative PRL PR__PRL ज िजनता िजनतर जतरा
24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर
25 Wh-word PRQ PR__PRQ त त तथी ततर
Indefinite तओ तछ
तउछ तोनो
3 Demonstrative DM DM
31 Deictic DMD DM__DMD ओ ई ऊ
32 Relative DMR DM__DMR ज जाह
33 Wh-word DMQ DM__DMQ त त तोन
Indefinite तओ तछ
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
36
CopyrightTDIL
तउछ तोनो
4 Verb V V
41 Main VM V__VM चलब रौप
पढइ खाइ
स हस
42 Auxiliary VAUX V__VAUX अछ छल
होएब थत
5 Adjective JJ नीत मोटता ललत
6 Adverb RB भन अनायास
कमश
एताएत
अवशय पनत फर
7 Postposition PSP स त लल
8 Conjunction CC CC
81 Co-ordinator CCD CC__CCD आओर परच
मदा वा
82 Subordinator CCS CC__CCS ज त यद
9 Particles RP RP
91 Default RPD RP__RPD भर यौ हौ रौ
Classifier CL RP_CL टा गोट गो
93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा
94 Intensifier INTF RP__INTF बह बसी खब नान
95 Negation NEG RP__NEG न नह जन
10 Quantifiers QT QT
101 General QTF QT__QTF तनत बह
तछ
102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
37
CopyrightTDIL
ीन चार
103 Ordinals QTO QT__QTO पहल दोसर सर चारम
11 Residuals RD RD
111 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
112 Symbol SYM RD__SYM $ ( ) For symbols
such as $ amp
etc
113 Punctuation PUNC RD__PUNC Only for
punctuations
114 Unknown UNK RD__UNK
115 Echowords ECH RD__ECH जलख (लख)
मट (सट)
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
POS for Urdu Sl No
Category Label Annotation
Convention
Examples Remarks
Top level Subtype
(level 1)
Subtype
(level 2)
1 Noun
)ism-اسم(
N N لڑکا)laRkaa(
))raajaaراجا
)kitaab(کتاب
11 Common
-نکره(nakeraa(
NN N__NN کتاب)kitaab(
)qalam(قلم
)cashma(چشمہ
12 Proper
-معرفہ(
NNP N__NNP موہن))Mohan
رشمی
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
38
CopyrightTDIL
mlsquoaarefa(( )Rashmi(
)Ravi(روی
13 Verbal
حاصل ( ndashمصدر
haasil-e-masdar(
NNV N__NNV جلن)jalan(
)calan(چلن
)bahaao(بہاؤ
بناوٹ )banaavat(
May be considered for Urdu- Hindi too
14 Nloc
) zarf-ظرف(
NST N__NST اوپر)upar(
)niice(نيچے
)aage(آگے
)piiche(پيچهے
2 Pronoun
)zamiir-ضمير(
PR PR يہ)yih(
)voh(وه
)jo(جو
21 Personal
ضمير (-شخصی
zamiir-e-shakhsii(
PRP PR__PRP وه)voh(
)tum(تم
)maim(ميں
In Urdu unlike Hindi voh is used both for singular and plural
22 Reflexive
ضمير )-معکوسیzamiir-e-
mlsquoaakoosii)
PRF PR__PRF اپنا)apnaa(
)khud(خود
اپنے آپ
)apne aap(
23 Relative
ضمير )-موصولہzamiir-e-mausoolaa(
PRL PR__PRL جو)jo(
)jab(جب )jis(جس
)jahaM(جہاں
24 Reciprocal
-ضمير راجع)zamiir-e-raajelsquo)
PRC PR__PRC باہم)baaham( درميان
)darmiyaan(
)aapas(آپس
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
39
CopyrightTDIL
25 Wh-word
ضمير )-استفہاميہzamiir-e-istafhaamiyaa)
PRQ PR__PRQ کون)kaun(
)kab(کب
)kahaaM(کہاں
3 Demonstrative
-ضمير اشاره)zamiir-e-ishaaraa)
DM DM يہ)yih(
)voh(وه
)inn(ان
)unn(ان
31 Deictic
-اشارے(ishaare(
DMD DM__DMD يہ)yih(
)voh(وه
32 Relative
ضمير اشاره )ہموصول -
zamiir-e-ishaaraa
mausoolaa)
DMR DM__DMR جو)jo(
) jis(جس
33 Wh-word
ضمير اشاره (-استفہاميہ
zamiir-e-ishaaraa
istafhaamiyaa(
DMQ DM__DMQ کون)kaun(
)kis(کس
)kitnaa(کتنا
According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
40
CopyrightTDIL
following words are also placed chand
blsquoaaz fulaan sab bahut Can we have a category
subtype like indefinitive demonstrative (DMI)
4 Verb
)flsquoel-فعل(
V V گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
41 Main VM V__VM گرا)giraa(
)gayaa(گيا
)sonaa(سونا
)haMstaa(ہنستا
411 Finite
-محدود(mahdoo
d(
VF V__VM__VF This subtype
WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information at
the word
level
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
41
CopyrightTDIL
412 Nonfinite
غيرمحدو(air gh-د
mahdood(
VNF V__VM__VNF -- do--
413 Infinitive
-مصدر(masdar(
VINF V__VM__VINF -- do--
414 Gerund
حاصل (-مصدر
haasil-e- masdar(
VNG V__VM__VNG -- do--
42 Auxiliary
-فعل امدادی(flsquoel-e-imdaadi(
VAUX V__VAUX ہے)hai(
)rahaa(رہا
)huaa(ہوا
5 Adjective
)sifat-صفت(
JJ دلکش)dilkash( )safed(سفيد
)siyaah(سياه
)cauRaa(چوڑا
)uuMcaa(اونچا
6 Adverb
-متعلق فعل(mutlsquoalliq-e-
flsquoel(
RB تيز)tez(
jald((جلد
7 Postposition
-jaar-جارموخر(e-moakkhar(
PSP سے)se( نے )ne( کو )ko(
)meiM(ميں
8 Conjunction
)atflsquo-عطف(
CC CC اور)aur(
)agar(اگر
کيوں کہ )kyoMki(
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
42
CopyrightTDIL
81 Co-ordinator
-حرف وصل(harf-e-vasl(
CCD CC__CCD اور)aur(
)voh(وه
)yaa(يا
)ki(کہ
)balki(بلکہ
82 Subordinator
-تابع کننده(taablsquoe
kunindaa(
CCS CC__CCS اگر)agar(
کيوں کہ )kyoMki(
)to(تو
821 Quotative
-اقتباسی(iqtabaas
ii(
UT CC__CCS__UT Not required
9 Particles
)haaliyaa-حاليہ(
RP RP تو)to(
)hii(ہی
)bhii(بهی
91 Default
-ڈيفالٹ)Default)
RPD RP__RPD تو)to(
)hii(ہی
)bhii(بهی
92 Classifier
-درجہ بند(darja band(
CL RP__CL Not required
93 Interjection
-فجائيہ(fajaarsquoiyaa(
INJ RP__INJ اے))e
)o(او
)are(ارے
)jii(جی
)ahaa(اہا
)vaah(واه
94 Intensifier INTF RP__INTF بہت)bahut(
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
43
CopyrightTDIL
-حرف تاکيد(harf-e-taakiid(
)behad(بے حد
)albattaa(البتہ )zaroor(ضرور
خبردار )khabardaar(
95 Negation
-حرف نہی(harf-e-
nahii(
NEG RP__NEG نہ)na(
)nahiiM(نہيں
10 Quantifiers
-کميت نما(kamiiyat
numaa(
QT QT چند)cand(
متعدد
)mutarsquoaddad(
)qaliil(قليل
)kasiir(کثير
101 General
)aamlsquo -عام(
QTF QT__QTF تهوڑا)thoRaa(
)bahut(بہت )kuch(کچه
102 Cardinals
-اعداد مطلق(alsquoadaad -
e-mutlaq(
QTC QT__QTC ايک)Ek(
)do(دو
)tiin(تين
103 Ordinals
-ترتيبی اعداد(tartiibii
alsquoadaad(
QTO QT__QTO اول)avval(
)doam(دوم
)pahalaa(پہال دوسرا
)duusaraa(
11 Residuals
baaqi-باقی مانده(maandaa(
RD RD
111 Foreign RDF RD__RDF A word
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
44
CopyrightTDIL
word
-بديسی لفظ(bidesii
lafz(
written in
script other
than the script
of the original
text
112 Symbol
-عالمت(lsquoalaamat(
SYM RD__SYM $ amp ( )
amp $
Such symbols are not used in Urdu They are written
(dollar) ڈالر (pound)پاونڈetc
113 Punctuation
-اوقاف(auqaaf(
PUNC RD__PUNC Only for
Punctuations
114 Unknown
naa-نامعلوم(mlsquoaaloom(
UNK RD__UNK
115 Echowords
گونج دار (-الفاظ
goonjdar lafz(
ECH RD__ECH )ول) -دل
)dil-) vil
ويار) -پيار(
)pyaar-) vyaar
وائے)-چائے(
)caalsquoe-) vaalsquoe
The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
45
CopyrightTDIL
7 XML INTERNATIONALIZATION BEST PRACTICES
To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework
71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)
ITS is a technology to easily create XML which is internationalized and can be localized effectively
ITS for Schema developers
User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]
Main Attributes
Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]
8 XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
46
CopyrightTDIL
9 METADATA ON POS Metadata
Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications
XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means
METADATA AS PER ISO 126201999
Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt
ltlanguageSectiongt
ltlanguagegtenltlanguagegt
ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999
ltauthorgtltauthorgt
ltdomaingtltdomaingt
ltorigingt
ltcreationgt ltcreationDategt1999-01-01ltcreationDategt
ltcreationgt
ltdescriptionSectiongt
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
47
CopyrightTDIL
ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt
ltdefinitionClassgt
ltdescriptionSectiongt
ltlanguageSectiongt
10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
48
CopyrightTDIL
Start (Raw Corpora)
Declare Metadata
Declare POS Schema
Select Script (Devanagari
Malayalam Bangla Perso-arabic-----------
-- n=12
Select Language (Hindi Malayalam Bodo Kashmiri ----
---------n=22
Display (Metadata)
Call (POS Schema)
Display (Desired Nodes)
Hide (remaining nodes)
End
The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram
11 POS SCHEMA BLOCK DIAGRAM
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
49
CopyrightTDIL
12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
Pos schema ()
ltxml version=10 encoding=UTF-8gt
ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt
ltfile Descgt
lttitleStmtgt
lttitlegtPOS tag in multilingual languagelttitlegt
ltscriptgt ltscriptgt
ltlanguagegtmultilingualltlanguagegt
ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt
lttypegtmultimodallttypegt
[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]
--------------------------------------Noun Block--------------------------------------
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo
kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-
cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-
cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt
ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा
दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-
catrdquoકવચકrdquo tag=rdquoNNVgt
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
50
CopyrightTDIL
ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन
दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo
kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt
-------------------------------------Pronoun Block-----------------------------------
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-
cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-
catrdquoસવરાાrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo
kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo
kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt
ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-
cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo
asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-
cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ
दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক
সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
51
CopyrightTDIL
----------------------------------Demonstrative Block------------------------------
ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन
दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-
cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt
ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-
cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-
catrdquoઉલદશરકrdquo tag=rdquoDMDgt
ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-
cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo
asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt
ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम
सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-
cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt
-------------------------------------Verb Block---------------------------------------
ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo
kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt
ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-
cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-
cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt
ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब
थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-
cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt
ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-
cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo
kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
52
CopyrightTDIL
ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-
cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo
kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt
ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-
cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-
cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt
ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo
brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-
cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt
------------------------------------Adjective Block----------------------------------
ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-
cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-
catrdquoિવશષણrdquo tag=rdquoJJrdquogt
---------------------------------------Adverb Block----------------------------------
ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान
थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo
kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt
-----------------------------------Post Position Block-------------------------------
ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन
महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-
cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt
------------------------------------Conjunction Block-------------------------------
ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo
mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-
catrdquoસ કજકકrdquo tag=rdquoCCrdquogt
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
53
CopyrightTDIL
ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-
cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-
cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt
ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो
महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-
cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt
ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-
cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo
kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt
------------------------------------Particles Block------------------------------------
ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-
cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-
catrdquoિાપતrdquo tag=rdquoRPrdquogt
ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo
mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-
catrdquoસવ rdquo tag=rdquoRPDgt
ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ
दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-
cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt
ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-
cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-
cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt
ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ
दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार
अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
54
CopyrightTDIL
ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन
दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार
अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt
------------------------------------Quantifiers Block--------------------------------
ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा
दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-
cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt
ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo
mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-
cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt
ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब
बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-
cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt
ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार
बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক
শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt
------------------------------------Residuals Block----------------------------------
ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-
cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt
ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-
cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-
cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt
ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-
cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
55
CopyrightTDIL
ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo
mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-
catrdquoઅણ શબદકrdquo tag=rdquoUNKgt
ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-
cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত
িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt
ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-
cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-
cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt
ltxsattributegt
ltxselementgt ltxsschemagt
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
56
CopyrightTDIL
13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3
Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক
Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক
Verbal कयामलत तद
ਿਕਿਰਆਮਲਕ حاصل مصدر
કવચક କରୟାବାଚକ
িয়াালক
Nloc दश-ताल साप
ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ
ানবাচক
2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা
Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی
ષવચક ବୟକତବାଚକ বযিিবাচক
Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی
પિતિતિતત ଆତମବାଚକ আতবাচক
Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع
પરસપરવચચ ପାରସପାରକ বযিতহাা
Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ
પ રવચક ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत
सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ
ଂକେତବାଚକ িনেদরশক
Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক
Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول
સપક ସଂବନଧବାଚକ সবাচক
Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ
પવચચ ପରଶନବାଚକ বাচক
Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া
Auxiliary Verb
सहायत कया
ਸਹਾਇਕ
ਿਕਿਰਆ
امدادی فعل સહકર
ક
ସହାୟକ କରୟା
েগৗণ িয়া
Main Verb
मखय कया ਮਖ ਿਕਿਰਆ فعل الزم
ખ ମଖୟ କରୟା াখয িয়াপদ
Finite परम ਕਾਲਕੀ لفع محدود
ણર ପରମତ সাািপকা
Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
57
CopyrightTDIL
Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر
વતરાાનદદત କରୟାବାଚକ েযাজক িয়া
Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود
અણર ଅପରମତ অসাািপকা
Participle Noun
तद परत नाम NA NA NA NA িয়াজাত িবেশষয
5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ
6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ
7 Post Position
परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর
8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক
Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ
সায়ক
Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده
ગૌણકદશરક শতর সকেযাজক
Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی
NA ଉକତବାଚକ উিিবাচক
9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند
િાપત ଅବୟୟ ନପାତ
অবযয়
Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ
સવ ବୟତକରମ সাধাাণ অবযয়
Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند
NA ବରଗୀକାରକ বগরবাচক
Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ
તકધક
ବସମୟ ବୋଧକ িবয়ািদেবাধক
Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক
Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক
General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ
Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক
Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক
11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ
Foreign word
वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ
Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক
Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত
Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف
Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ
અરણાતાક ପରତଧ ନୀ অনকাা শ
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
58
CopyrightTDIL
Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri
(Hindi) Marathi
1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय
नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत
तद িয়াবাচক
हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित
नाम
Nloc दश-ताल साप
ানবাচক
थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव
दश कालवाचक
नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक
Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक
Reciprocal पारसपरत পাৰিৰক
गावज गाव सोमोनदो
बाहमी باہمیबोहमी
पारसपारिक
Relative सबन- वाचत সবাচক सोमोनदो दिनथथा
रोबावय सबधवाची رٲبتٲوۍ
Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत
3 Demonstrative नयवाच सतवाचत
িনেদরশেবাধক थावन दिनथथा
हावन ہاون پرناوتۍपरनावतय
दरशक
Deictic नदशी তয িনেদরশক
थ दिनथथा وٲنيٲوۍ वोनयोवय
Relative समबनन
वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच
सबधदरशक
Wh-words पवाचत েবাধক অবযয়
म सथ दिनथथा ک لفظ त-लफ़ परशनारथक
Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary
Verb सहायत कया
সহায়কাৰী িয়া
लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद
Main Verb मखय कया
াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद
Finite परम সাািপকা
जाफजा थाइजा ہشر ہاو हशर हाव
आखयात करियारप
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
59
CopyrightTDIL
Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत
Gerund कयावाचत িনিাতাতরক সক া
जाफबाय थानाय दिनथथा
काव کراوتہ ناوتनाव
विभकतिकषम कदतरप
Non-Finite गर-परम অসাািপকা
जाफङ थाइजा نا ہشر ہاو ना हशर हाव
आखयाततर करियारप
Participle Noun
तद परत नाम
NA NA NA NA NA
5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া
িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण
7 Post Position
परसगर অনসগর
सोदोब उन महरथ پوت جاے पो जाय
अतयसथान
8 Conjunction योजत সকেযাজক
दाजाब महरथ واڻون राटवन उभयानवयी अवयय
Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ
NA
Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA
Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान
उदगारवाचक
9 Particles अवयय আনষকিগক অবযয় महरथ
टोट वनतय अवयय ڻوڻہ ونتۍनिपात
Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক
সগর थ दिनथथा दाजाबदा
वरगहा NA ورگہا
Interjection वसमयादबोनत
িবয়েবাধক सोमोनानाय
दिनथथा
छट ژهڻت
छटथ
विसमयवाचक
Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक
Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक
10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक
General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन
थनद
गणनावाचक
Ordinals कमसचत াবাচক সকখযাবাচক শ
फार बसान نۍ گريند वनय وٴथनद
करमवाचक
11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
60
CopyrightTDIL
Foreign word
वदशी शबद
িবেদশী শ
गबन हादरार सोदोब
غٲر ملکی لفظ
गोर मलत लफ़
विदशी शबद
Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন
थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह
Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय
लफ़
नादानकारी अभयसत
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
61
CopyrightTDIL
Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani
1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப
பெயர जावाचत नाम
Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர
वयवाचत
नाम Verbal कयामलत
तद కరయమలకం NA தொழில
பெயர कयामळत नाम
Nloc दश-ताल साप
దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम
2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர
सवरनाम
Personal वयवाचत వయకతవచకం രഷ സര വനാമം
மூவிடபபெய परश सवरनाम
Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം
தறசுடடுப பதிலடுப
பெயர
आतमवाचत
सवरनाम
Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം
பரஸபர பதிலடுப
பெயர
सबद सवरनाम
Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം
இணைபபு பதிலடுப
பெயர
एतमत सवरनाम
Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം
வினாச சொல
पसनाथन सवरनाम
Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत
सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत
Deictic नदशी నరదషట തക സചകം சுடடு
பதிலடுப பெயர
दशरत उर
Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം
வினாச சொல सबद दशरत
Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം
வினை पसनाथन दशरत
Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை
வினை कयापद
Auxiliary Verb
सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद
Auxiliary Finite
(पणर पालवी
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
62
CopyrightTDIL
कयापद)
Auxiliary Non Finite
(अपणर पालवी कयापद)
Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी
कयापद Participle
Noun तद परत नाम NA NA பினனுருபு NA
5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச
சொல वशशण
6 Adverb कया-वशषण కరయవశషణం കിയാ
വിേശഷണം இணை
இணைபபுச சொல
कयावशशण
7 Post Position
परसगर పరసరగ അനേയാഗം
சாரபு இணைபபுச
சொல
सबद अवयय
8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல
जोड अवयय
Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം
இடைசசொல समानानीतरण जोड अवयय
Subordinator अनीनसथ వయధకరణం ആശരസചക
സമചയം
முனனிருபபு आशी जोड अवयय
Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം
இனபபிரிபபு ஒடடு
अवरण -अथन उर
9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல
अवयय
Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय
Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர
ीवतार अवयय
10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர
सखयादशरत
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
63
CopyrightTDIL
General सामानय సమనయం ൊതസംഖാവാചി
எஞசியவை सामानय
Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി
அயல சொல सखयावाचत
Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर
Foreign word
वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு
वदशी
Symbol पीत సంకతం ചിഹം இரடடைககிளவி
तर
Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
64
CopyrightTDIL
14 ALGORITHM FOR SELECTION OF NODES
If script is Devanagari then
If language is Hindi then
Display (Metadata)
Call (POS Schema)
Display (English and Hindi Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Bodo then
Call (POS Schema)
Display (English Hindi and Bodo Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
65
CopyrightTDIL
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर
दिनथथाrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम
दिनथथाrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब
दिनथथाrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव
दिनथथाrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
If language is Konkani then
Call (POS Schema)
Display (English Hindi and Konkani Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-
cat=rdquoजावाचत नामrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-
cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo
tag=rdquoPRrdquogt
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
66
CopyrightTDIL
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश
सवरनामrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-
cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Malyalam (Orthographic variation) then
If language is Malyalam then
Call (POS Schema)
Display (English Hindi and Malyalam Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-
cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-
cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-
cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
67
CopyrightTDIL
End if
Else If script is Perso-Arabic then
If language is Kashmiri then
Call (POS Schema)
Display (English Hindi and Kashmiri Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo
tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo
tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo
lttag=rdquoPRP rdquoشخصيٲتی
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo
lttag=rdquoPRF rdquoماکوسی
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
68
CopyrightTDIL
Else If script is Bangla then
If language is Assamese then
Call (POS Schema)
Display (English Hindi and Assamese Nodes)
Hide (remaining nodes)
Eg
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-
cat=rdquoজািতবাচকrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-
cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-
cat=rdquoআতবাচকrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
Else If script is Gujarati then
If language is Gujarati then
Call (POS Schema)
Display (English Hindi and Gujarati Nodes)
Hide (remaining nodes)
Eg
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
69
CopyrightTDIL
ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt
ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-
cat=rdquoિતવચકrdquo tag=rdquoNNgt
ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoવયતવચકrdquo tag=rdquoNNPgt
ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo
tag=rdquoPRrdquogt
ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-
cat=rdquoષવચકrdquo tag=rdquoPRPgt
ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-
cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
End if
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
70
CopyrightTDIL
15 REFERENCE BASED IMPLEMENTATION
Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC
2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC
3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM
RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN
औरCC_CCD मानयाN_NN हV_VM RD_PUNC
4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD
सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX
RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP
इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM
वालाPSP होाV_VM हV_VAUX RD_PUNC
Punjabi
1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN
2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN
ਹV_VAUX |RD_PUNC
3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX
CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN
ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC
4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN
ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP
ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC
5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO
ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN
ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
71
CopyrightTDIL
Tamil
1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN
ிகடகிிறV_VM_VF RD_PUNC
2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF
சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC
3 ஒவவதாQT_QTC தணணதமN_NN றN_NN
மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX
ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN
மிபதமN_NN வதயநதமV_VM_VF RD_PUNC
4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC
நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT
ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX
RD_PUNC
5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN
கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT
சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC
Malayalam
1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF
െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC
2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ
മഹതംN_NN ഉണV_VAUX RD_PUNC
3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ
ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC
സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN
ഉണV_VAUX RD_PUNC
4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC
നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN
എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF
RD_PUNC
5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN
സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF
RD_PUNC
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
72
CopyrightTDIL
Bangla
1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC
3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC
জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC
4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-
NNP নাোN_NN পিািচতN_NN ৷RD_PUNC
5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD
সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC
Marathi
1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC
2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC
3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM
पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ
आहVM PUNC
4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP
रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC
5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD
सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC
Gujarati
1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN
2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM
3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ
છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD
ાદતN_NN છV_VM
4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD
સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX
5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-
NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
73
CopyrightTDIL
Konkani
1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC
2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC
3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ
आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ
आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC
4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC
नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC
5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF
दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC
Urdu
N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3
RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت
CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ
RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5
JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے
Oriya
1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |
2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |
3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନୟତାN__NN ଅଟେ V__VAUX |
4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD
ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |
5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR
ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
74
CopyrightTDIL
16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash
Specification of data categories and management of a Data Category Registry for language resources
2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215
3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-
20070403
5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
75
CopyrightTDIL
ANNEXURE-1
LANGUAGE TAGS
SNo Language Name Language Tags according to ISO 639-3
1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi
76
CopyrightTDIL
CONTRIBUTERS
1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi