Upload
phamthuy
View
221
Download
2
Embed Size (px)
Citation preview
Title<Note>Input/Output Methods for Thai : Development of aDatabase and a Computer Concordance for the Three SealsLaw of Thailand
Author(s) Shibayama, Mamoru
Citation 東南アジア研究 (1987), 25(2): 279-296
Issue Date 1987-09
URL http://hdl.handle.net/2433/56282
Right
Type Journal Article
Textversion publisher
Kyoto University
Southeast As£an Stud£es, Vol. 25, No.2, September 1987
Input/Output Methods for Thai
--Development of a Database and a Computer Concordancefor the Three Seals Law of Thailand--
Mamoru SHIBAYAMA*
Abstract
An intelligent Thai computer terminal and a Thai text editor with the function ofautomatic and consecutive conversion from Roman spelling to Thai letters have beendeveloped which are operable on a micro computer. These employ the TransliterationMethod (TM) or the Simplified Transliteration Method (STM), which are based on anewly devised transliteration table from Roman spelling. We are now developing adatabase and a computer concordance of the Three Seals Law (Kotmaz· Tra SamDuang), and making a machine-readable Thai dictionary using this terminal and editor.
The Transliteration and Simplified Transliteration Methods were both estimated torequire a greater number of key strokes in the making of the machine-readable Thaidictionary than the method used for the ordinary IBM electronic Thai typewriter, herecalled the Direct Mapping Method (DMM). However, an evaluation of learning effectsfrom the number of key strokes and the measurement of learning curves in the inputof the Thai dictionary indicated that although the Transliteration Method required a42.9% greater number of key strokes than the Direct Mapping Method, a 9.8% higherinput rate in terms of characters per minute.
For the output of Thai letters, the design and implementation of a printing systemfor a Japanese laser beam printer run from a main-frame computer and a CRT displayfor a micro computer are described.
I Introduction
In the Center for Southeast Asian Studies,
Kyoto University, we are now developing
a database for the Three Seals Law (Kotmai
Tra Sam Duang, compiled in 1805, about
1700 pages, about 32,500 lines in the
five-volume version published in 1962) on
the main-frame computer in the Data
Processing Center, Kyoto University with
a view to making a computer concordance,
which will provide an important resource
* ~rl1 ;y, The Center for Southeast AsianStudies, Kyoto University
for studying the history, law, sociology,and
linguistics of Thailand [Ishii 1969]. For
processing this Thai text, the input/output
methods for Thai text and Thai letters are
of major importance and present sophisti
cated problems.
Two input methods are considered. One
is the Direct Mapping Method (DMM) by
which each Thai letter corresponds uniquely
to one of the keys on the keyboard [Sugita
1980]. This scheme requires an intelligent
terminal to identify which Thai letter is
input; in other words, the ordinary teletype
terminal cannot be used for the input/output
279
of Thai letters because it is impossible to
display/print Thai letters on the terminal.
The other method is a transliteration
approach using a table by which Thai
letters are generated from Roman letters
according to their pronunciation, like the
Roman-Kanji conversion in Japanese, and
is a method, here called Hartmann's
Transliteration Method (HTM), proposed
by J. F. Hartmann and G. M. Henry
[Hartmann and Henry 1983J. This ap
proach requires a greater number of key
strokes for input than the Direct Mapping
Method, but it is readily operable by non
native speakers of Thai. Also, it does not
require an intelligent terminal if the
transliteration approach is used for display
ing Thai letters and if the main-frame
computer performs the function of trans
literation.
In 1984, we implemented a Thai text
editor in which we adopted the DMM as
one of the input methods [Shibayarna et al.
1984]. In 1985, we modified the table
proposed by J. F. Hartmann et al. to
produce a new transliteration table, here
called the Transliteration Method (TM),
which was incorporated into the text editor
[Shibayama et al. 1985J. We have also
developed a method Roman-Thai conversion
called the Simplified Transliteration method
(STM) [Shibayama and Hoshino 1987].
For output, we have developed a display
system of Thai letters on the micro computer
and a printing system of Thai letters on
a laser beam printer run from the main
frame computer at the Data Processing
Center, Kyoto University.
This paper describes the characteristics
280
of the DMM and the TM implemented on
the Thai text editor~ compares the DMM,
TM, and HTM by estimating the munber
of key strokes required to input text of the
Three Seals Law, and shows the results of
typing speed and learning curves measured
for the work of inputting the main entries
of a Thai-Thai dictionary by the DMM and
TM methods. The basic idea of the STM,
which should be more readily operable for
non-native speakers of Thai, IS also
presented.
For output, the characteristics and
structure of display/printing controls in the
system are described.
Lastly, an appendix presents an outline
for developing a database and a computer
concordance of the Three Seals Law and
making a machine-readable Thai dictionary.
IT Characteristics of Thai
The Thai writing system differs from that
of western languages in several points:
(a) Thai letters are phonetic. Thai has
5 tones. Each Thai syllable is com~
posed either of consonant+vowel or
consonant+vowel+consonant.
(b) The consonants, vowels, and tonal
marks must be positioned appropri
ately.
(c) Words in a sentence are not separated
from each other.
(d) Vowels may be placed before, after,
above, or below a consonant.
(e) Punctuation is scarcely used.
Fig. 1 shows an example of Thai script
printed by our Thai printing system using
the laser beam printer. Given the charac-
530807
530808
530809
530810
530811
530812
530813
M. SHIBAYAMA: Input/Output Methods for Thai
I
l11~li1~11G}j~oli It'HU111 U'Yl/li1~Bltlf)11 I~ ~/llfnnlw~tI
ttn l~l'fll1B~ (li11~1 •tfl~BUI .B1U/) bt~/~fb'HUI nJul \l/Glhtl/"Ultl/~1tlI° ~ ~ Y <) I Y I ICIlY c:!.
\lolU1/UUI f)/fl1'a~1tl/ tl1Y /~HJbt~1 / WfI11~11/ bVi t'atlflc::o CIl SI I SI~ ~ ~ III SInnl b~U'YlU/blHbfl/~UU/ fl'aUI\l~/tlU/fl~ b1 l~llJ/Ul1 /'W1~
~ ~ I SI
ft' SIc:!. e:v <! e:v ~ ° ° ~Bltlf111/ Hltl~lJl11litl/~~/11J(~BI.COUltll .\llUl/) UUI 'Hl/b'U~~
lH'llU/bJI ~1tl/l\ll(;lf)/bU~Ul/iJ~~/liUu/li\l11fUl/n/\l~/
(iBHll .fUuoll.tll nl'IBf)1 fl~1 t1JU/l1B~ (bfl1lBU/ .B1U/) nitrl \)/ll/\) ~IFi~. 1 Example of Thai Letters: A Part of Text of the Three Seals Law
teristics of Thai noted above and shown in
Fig. 1, several points must be considered
in processing Thai text using the computer:
(a) How to input Thai letters.
(b) How to control the display and printing
positions for each consonant, vowel,
and tonal mark.
(c) How to divide a text into sentences and
a sentence into words, namely, seg
mentation.
Several studies into the complex question
of input/output of Thai have been made.
For input, typical methods include the
keyboard of the ordinary IBM electronic
Thai typewriter and the keyboard connected
to the computer used in Thailand, which
employs the DMM. For inputting Thai
text of the Three Seals Law, Sugita adopted
the DMM in using a graphic terminal
connected to an IBM host computer
[Sugita 1980]. Hartmann and Henry,
who have been using the computer to study
Thai language in the field of library
information, have proposed the trans
literation approach mentioned in Section 1
[Hartmann and Henry 1983].
For output the typing head of the IBM
electronic Thai typewriter and the use of
ROM (Read Only Memory) for display and
printing Thai letters on the computer are
generally used. Sakamoto has developed
integrated computer programs by which
several Southeast Asian and African can
be printed out with a laser beam printer
[Sakamoto 1979]. Also, a printing pro
gram which can output Thai letters has
been developed in the Center for Infor
mation Processing of Tsukuba University.
III Thai Text Editor
Fig. 2 shows file control and editing
screens of the Thai text editor implemented
on a micro computer. This editor employs
the functions shown in Table 1. The
column "Screen" in Table 1 shows whether
the screen is in (a) the file control or (b) the
editing mode in Fig. 2. The screen in
Fig. 2(b) is in the TM mode described in
section IV.2 and IV.4.2, and corresponds
to a record, namely, a line in the text which
has a maximum of 160 characters in Thai,
and which is divided into 4 lines in order to
display Thai characters. Below each dis
play line is the area of movement of cursor.
The cursor can be moved to any directions
281
INITIAL SET SCREEN
ITHAI TEXT EDITOR'
[V2.2]
86/03/22
1) Source Text Drive#(l/2) : ? 2
2) Source Text File Name
3) Work File Initial Set
[FUNCTION]
SOURCE
WARM
4) Line Number Start Col.: 1
5) Line Number Length : 6
6) Text Length/ Line 122
7) Start
Line Number :1'---~ ----J
MODE COPY TSS TSS PRINT END EDIT
FILEC2:S0URCE ]MODE[Roman]
(a) File Control Screen
THAI TEXT EDITORCURSORC ·4 LINE. 27 COL.J DPC 1]ASCII C10SJ GETCI1:
85/01/09TMAXC OJ
4J PUTC.: J
d 3 d d k 3
SHIBAYAMA MAMO~I
/-d~1J/
Q/
-'jA
FORWARD BACKWAR
282
:-~ ')J Q/
tJ~11~cflllDUOtJ 10
Q.oo'
-J~
-d~
A- A1 A2
/ SAVE MODE
(b) Editing Screen
Fi~.2 Screens of the Thai Text Editor
HCOPY SC SET
M. SHIBAYAMA: Input/Output Methods for Thai
Table 1 Functions of the Thai Text Editorusing the arrow keys, and Thai characters
are displayed using graphic instructions
on the micro computer.
We found that a character pattern
composed of 16(W)*32(H) mesh stored in
RAM (Random Access Memory) can be
displayed without appreciable delay by
using the PUT@ statement in the graphic
instructions. We synthesized a new
pattern on the GVRAlVl (Graphic Video
RAM) by using the OR operation for
all bits of the patterns shown in Fig. 3.
This figure also shows that patterns other
than tones should be shi fted by 5 dots
downward in the Y direction from the
standard position in order to save the main
memory area for storing all patterns.
Screen
(a)(b)
(a)
(a)
(a)
(a)
(b)
(b)
(b)
(b)
(b)
(b)
(a)
Key
f.lf.6
f.2
f. 3, f.4
f.8
f.9
f.l
f.2
f.3
f.5
f.8
f.10
f.10
Indication
MODEMODE
COpy
TSS
END
FORWARD
BACKWARD
"I"SAVE
HCOPY
SC SET
EDIT
Function
Input Method isSpecified for Thai
Back-up of CurrentFile is Executed
TSS Emulator isInvoked
File Printing
Quit the Editor
Move to NextRecord
Move to PreviousRecord
"I" is Inserted
Editing File isSaved
A Record is Printed
Screen (a) isInvoked
Editing is Restarted
16 dots
Thai pronunciation, namely, the Trans
literation Method (TM), has the benefit
of improving the operability of typing for
non-native speakers of Thai and for people
IV Input Methods
1. D£rect Mapp£ng Method (DMM)
In the Thai text editor implemented on
the micro computer, we adopted the
keyboard assignment, here called the Direct
Mapping Method (DMM), as shown in
Fig. 4. Since this keyboard assignment
is almost equivalent to that of the IBM
electronic Thai typewriter, and the editor
employs the dead key control, whereby
a character pattern overlaps the preceding
patterns without the carriage moving, so
that the consonants, vowels, and tones can
be displayed in their appropriate positions
on the CRT, the editor can be used like the
IBM electronic Thai typewriter.
2. Translz"terat£on Method (TM)
The input method of Thai letters by
means of the Roman letters representing the
y
y
r-..,-v'q 'I'" + > 8
CjM >16
} 8
Consonant Vowel Tone
RegIstered patterns
"......"
r:::+ }5 dots "'v +1'\\.V
~
'1AJDisplayed patterns
o :The origin of pattern
Fi~. 3 The Display on the CRT
xdots
dots
dots
x
283
SHIFT
""'-- ------------ll XFER )
(a): Little finger (b): Third finger(c): Middle finger (d): Forefinger
Fig.4 Keyboard Assignment of DMM
accustomed to the normal keyboard as
signment of Roman letters. At the same
time, the transliteration table should be
simple for typists and must be designed to
decrease the number of key strokes and the
sphere of movement of the fingers. To
this end, we have proposed the revised
transliteration table from Roman to Thai
letters shown in Table 2 and implemented
a function capable of automatic and
consecutive conversion according to this
table.
The characteristics of the transliteration
table are as follows:
(a) The Roman spellings of the Thai
consonants and vowels are classified
into 21 groups according to their
pronunciations, each group comprising
character strings headed by same
Roman character. The numerals,
tones, special symbols, and control
codes are classified into 28 groups. A
group number, GN, is assigned to each
of these 49 groups, and each character
string in a group is discriminated by
284
a local classification number, LeN.
(b) The transliteration approach by
Hartmann and Henry distinguishes
different Thai letters with the same
pronunciation by use of apostrophes,
for example, TH, rH', TH", TH"',
TH"", and TH"'''.
In our system, the distinction is repre
sented by adding a number to the Roman
spelling. The Thai letters are arranged,
moreover, in order of decreasing frequency
of occurrence in the text of the Three Seals
Law. In this way, the number of key
strokes required by the operator is de
creased. For example, TH) THi, TH2,
TH3, TH4) and TH5 are used for 'YI, 0,
ti, 3' 611, and WJ, instead of TH, TH',
TH", TU"', TH"", and TU""'. An
advantage of this scheme is that the
ordinary teletype terminal can be used for
input/output of Thai letters if the function
of interconversion of Roman and Thai
letters is implemented on the host com
puter.
M. SHIBAYAMA: Input/Output Methods for Thai
Table 2 Transliteration Table
(a) Consonants
GNL C N
I 2 3 4 5 6 7 8K KH KHI KH2 KH3 KH4
In PI tJ JJ tJ PI
C CH CHI CH22
ti~ AI ~
0 013
@l !IT TH THI TH2 TH3 TH4 TH5 TI
4fJI 't1 tI fj ! " l1li !IN NI NG
5\I, m '.:l
P PH PHI PH26
'I.J yt eJ fl
F Fl7
~ ~
L LI L2 LEU8
VI 11 •~ tn·GN : Group NumberLCN: Local Classification Number
(b) Vowels
GNLCN
1 2 3 4
R Rl REU9
'I • '11·'1
Y YI10
tJ 'lI5 51 52 53
IIi1 ti f1 tl
H HI12
VI fI
B13
'U
M14
N
W15
~
16 ?i)
.:Vowel
L C NGN :; 5 7 8 9 10I 2 4 6 II
A A- A: AI All AE AE- AE: AM AW A.17 - 1.- 1- ..- - - - -'1 LL- LL - :: LL- -"'1 1.-'1 -
I I: IA tA-18 - - l.:'tJ I.:'tJ::- -
U U: UA UA- UAI19
:.''';1 :.'~::~ :. -~-
E E- E: EU EU: EUI: EUA EUA-20
I.':' - - :'i) I.:'i) I.:'i)::I. - :: I. - - -0 0: au OU- au: OE OE: aE- O.
21't - : t- ~i) 1. 71.-'1:: -i) I. - i) I.-il: -
(e) Special Symbols and Control Codes
~ 22 23 24 25 26 27 28 29 30 31
0 I 2 3 4 5 6 7 ~ 9I
0 • til • « « b .. f6 «
~ 32 33 34 35 36 37 38, < :> + Q Z V
I .. - . ~- - - - "l ., -~ 39 40 41 42 43 44 45 46 47 48
/ • ( ) , - : sp <!!1
/ '" ( ) , - : sp ~
GN :49, LCN:I Carriage Return
285
3. Comparison of DMM, TM, and HTM
While the TM requires at most 49 keys to
be used on the keyboard, the DMM, in
which each Thai letter corresponds uniquely
to one key, requires the use of 92 keys.
Thus the TM allows the number of keys to
be reduced by 46.7%. However, the
number ofkey strokes required to input all
Thai letters by the TM is 176, which is
91.3% higher than the number required by
the DMM.
Compared with the DMM, the number
of key strokes required to input a text with
the same frequency of occurrence of Thai
letters as the text of the Three Seals Law,
it was estimated that the HTM, which is
the transliteration method proposed by
J. F. Hartmann and G. M. Henry, requires
32.0% more strokes and the TM 21.90/0
more strokes [Shibayama and Hoshino
1986a].
4. Measurement and Evaluation of
Learning Effect
Input of the main entries of the Thai-Thai
dictionary published by the Thai Royal
Institute [Photchana nukrom Thai 1982J,
a total of 31,202 words, was completed in
about 6 weeks by 3 persons (about 9
man-weeks). The frequency of occurrence
of each Thai letter in the dictionary, the
learning effect measured for the elapsed
time of the input work, and its evaluation
are as follows.
4.1 Frequency of Occurrence of That."
Letters
The main entries in the dictionary
contained a total of 217,926 letters, which
286
included all 72 character patterns. The
percentages of consonants, vowels, tones,
and others were 63.5%, 29.90/0' 5.4%,
and 1.2% respectively.
Table 3 shows the frequency of occurrence
of Thai letters in the main entries of the
dictionary. For inputting this text, the
ratio of the' number of key strokes, T.D.,
required by the TM and DMM can be
represented as follows:
TD = "Ef;r;. .. "Efi
where f; is the frequency of occurrence of
the i-th Thai letter indicated in the NO.
column in Table 3, 1"; is the number of
characters in its Roman spelling, and the
suffix i ranges from 1 to 70. The "Ef;ri
represents the total number of key strokes
for the text. It was found that the number
of key strokes required by the TM was
42.9% higher than by the DMM.
4.2 Environment of Measurement
The model of behavior in the input work
by the typist in the making of the database
for the Thai dictionary is shown in Fig. 5.
We have measured the learning effect for
two persons in the actual input work by the
DMM and TM in conjunction with the text
editor implemented by both methods. On
the editing screen for this input work, of
which an example is shown in Fig. 2(b),
the slash (/) indicates the division between
the words, and the hyphen (-) means that
the previous character string with no
hyphen is duplicated in this position.
The Thai character string in the second
row from the bottom in Fig. 2(b) is a prompt
for the next input in Roman spelling,
M. SHIBAYAMA: Input/Output Methods for Thai
Table 3 Frequency of Occurrence of Thai Letters in the Thai Dictionary
(a) Frequency of Occurrence of Consonants
NO. Letter Freq. NO. Letter Freq. NO. Letter Freq. NO. Letter Freq.
1 n 11,015 12 Q 11 23 en 3,956 34 tJ 7,024
2 tI 2,477 13 q 821 24 G 1,263 35 ":i 13,058
3 fJ 1 14 il 125 25 '" 11,350 36 ~ 6,389
4 Fl 3,437 15 {) 232 26 'U 4,164 37 d 6,158
5 FI 2 16 ! 311 27 'lJ 3,898 38 ~ 1,358
6 ~ 190 17 fI 243 28 ~ 900 39 ti 918
7 ~ 7,487 18 WI 63 29 eJ 278 40 ~ 5,427
8 ~ 2,841 19 m 1,295 30 'Y4 3,455 41 'IIi 4,748
9 ~ 544 20 61 4,841 31 ~ 463 42 cVJ 99
10 'l1 2,519 21 91 5,640 32 11 1,168 43 el 8,599
11 'lJ 627 22 tl 993 33 3.J 7,946 44 ~ 157
(b) Frequency of Occurrence of Vowels and Tones
NO. Letter Freq. NO. Letter Freq. NO. Letter Freq...,
45 ... 5,315 54 b 7,854 63 6,211...
46 7,893 55 U 2,415 64 5,308
47 '1 14,874 56 't 2,118 65 205
48 6,596 57 q 301 66 221
- ..49 4,421 58 11 4 67 2,302
- 627 59.,
1,625 68 '1 29250
51 1,798 60 1 652 69 , 2
52 - 4,092 61 1 1,495 70 1 20..a
53 - 1,988 62 806...
287
DMM
The Statementf--+
Recognitionf-.
The Displayof Thai of Character
Typing r--+ of Thai
1 ITM
The StatementJ---
Recognition The Displayof Thai of Character f--+ Conversion f--+ Typing f--- of Thai
t IFig.5 Model of Behavior for Typing of Thai
4.3 Measunment and Its Evaluation
It was assumed that the operators used
their fingers and hands in accordance with
the assignment shown in Fig. 4. The
frequency of the use of fingers, hands, and
each row of the keyboard by the typist is
illustrated in Fig. 6. It is noticeable that
the little fingers, which are considered least
effective, are used frequently in both
methods, and that the right hand works
more than left hand by 24.80/0 and 17.00/0
respectively in the DMM and the TM.
Of the rows of the keyboard, the home row
C is used most frequently, which is con
sidered to be effective, and the average
(d) (c) (b) (a)
58.5
[h1J25.0 18.9
11.7
2.9
TM
14.1 13.2
.DlJ](a) (b) (c) (d)
41.5
ClliJ(d) (c) (b) (a)
62.4
gB8_8._4__----J1L::.27:..::.9~___,1
E /18.3 45.4
Unit:%
Utilization of the Fingers, Hands, and Rowsof Keyboard
Fig. 6
5~ 11--11_7._1
_,1;12:.::.:;9.51-,__---r__--J134.9L....-_----Ih8.5
DMM
8.2 ,.ano=ELJ(a) (b) (c) (d)
37.6
Enumeration of Main Entries in theDictionary
Table 4
Input Operator (A) Operator (B)
Method Number I Number Number I Numberof Words of Char. of Words, of Char.
DMMI 12,455 [ 78,340 [ 4. 478 1
29,181
TMI 8,490 I 54, 527 1 4.661 I 29,200
TotalI 20, 945 1132, 867 I 9, 139 1 58,381
Table 4 shows the amount of text input
by two operators, the total number of
words and characters input being 30,084,
and 191,248 respectively. These two ope
rators had no prior knowledge of Thai
letters or Thai language, but were able to
input about 200 letters per minute of
Roman script.
displaying the group of Thai letters above
and their corresponding Roman spellings
below. Fig. 2(b) shows the prompt when
"U" was typed as the next input. The
Thai character string in the center of the
third row from the bottom represents
the result of transliteration of the input
of the Roman spelling inside the box in
the second row from the bottom.
288
M. SHIBAYAMA: Input/Output Methods for Thai
distance of movement of fingers is follows: S(t)=M(l-e-Gt)
8-.-- --,- 8~ ~
the transliteration table to
identify the Roman spel
ling, namely, the LeN in
Table 2. It is difficult,
however, to memorize the
transliteration table in a
short time, especially for
non-native speakers of
Thai, and the need to
consult it reduces the
speed of typing.
ooc:i
As shown in section IV.2 and from the
experiment just described,
to input any Thai letter
by the TM, the typist has
to memorize or consult
5. SZ'mplified TransNteration Method
(STM)
where M is the superior limit of the typing
speed, and G is the coefficient of training
efficiency. Fitting of the learning curves
to the measured values by use of this
relation gives M=37.25, G=0.0527 for the
DMM, and M=40.9, G=0.0797 for the TM.
Despite the time required to consult the
transliteration table in the TM, and the
42.9% greater number of key strokes than
the DMM, the typing speed is 9.80/0 higher
by the TM than the DMM. Consequently,
we found that the TM is more readily
operable by non-native speakers of Thai
accustomed to inputting Roman script.
It is also expected that the typing speed
would increase if the elapsed time could be
extended.
o~_~..-------r-~
::7
\!) • ~ : OHMX • + :TH~ • + :HISS ARTHI
x
GO.CO 120.00 180.00 2~O.OC 300.00T1ME (M) ~101 .
Fi~. 7 Typing Speed and the Learning Curve for Operator (A)
~g......o
ocX:c:i
::7'.....a:a:o:J:oWc:iu..'"o
d DMM=0.185*1+0.349*0+0.295*1+0.171*2
=0.882
dTM =0.63
where dDMM and dTM are the average
distance of movement in the DMM and the
TM estimated from the utilization in Fig. 6.
In comparison, the figures for the standard
English keyboard dE for Roman script
input, RICOH d2 (two-strokes method)
for Japanese input, and JIS dj (lIS
keyboard) for Japanese input are 0.66, 0.60,
and 0.91 respectively.
Fig. 7 shows the result of the measure
ment of typing speed by both methods and
the learning curves for operator (A). The
X axis in Fig. 7 indicates the elapsed time
in the actual input work, and the Yaxis
indicates the number of characters input
per minute. To represent the learning
curves, the speed of typing S(t) is fitted
to a function of the elapsed time t as
follows:
289
To eliminate the overhead time for
memorizing and consulting the trans
literation table in the TM, we have devised
a simplified transliteration table composed
of only GN's groups, without the distinction
of LCN, namely, the Simplified Trans
literation Method (STM, See Fig. 8). For
example, the Roman spelling "K" cor
responds to "n", "f!", "6JJ", "fJ", "6lJ",and "R". After pressing "K", the typist
then selects the appropriate Thai letter from
the group by pressing the "XFER" key
(See Fig. 4) on the keyboard, which
causes the Thai letters to appear one by
one cyclically, and by pressing any key
except the "XFER" key for the next
input when the appropriate Thai letter
appears.
To estimate the number of key strokes of
the STM using the frequency of occurrence
of Thai letters as shown in Table 3, the ratio
of the number of key strokes, S.D., required
by the STM and the DMM can be repre
sented as follows:
Simplified Transliteration Method
'\l
letter indicated in Table 3,
namely, the value of LCN,
in the j-th group belong
ing its i-th Thai letter and
the suffix j corresponds
to the value of GN shown
in Table 2. For example, the number of
key strokes required for "f!" is 2 ("K" and
"XFER" keys) which corresponds the value
of LCN in GN=1. And "'Ef;nj represents
the total number of key strokes for the
text.
It is estimated that the number of key
strokes required by the STM for inputting
a text with same frequency of occurrence of
Thai letters as the main entries of the
dictionary is 61.60/0 (43.00/0 for the conso
nants and 93.9% for the vowels and tonal
marks) higher than by the DMM. Com
pared with the TM, the STM requires
18.7% more strokes. However, the STM
is more readily operable by non-native
speakers of Thai than the TM, and the
number of key strokes can be reduced if the
sequence of appearance of Thai letters,
especially the vowels, in a group is changed
according to the text, like the learning
function for the Roman-Kanji conversion in
Japanese. This scheme can also be imple
mented on an intelligent terminal capable of
displaying Thai letters, such as a micro
computer.
fln
--------------------~)-
S D - "'Efm j..- "'Eli
Fig.S
NextLetter.--{
where f; is the frequency of occurrence of
the i-th Thai letter indicated in the NO.
column in Table 3 and the suffix i ranges
from 1 to 70. The nJ is the number of
key strokes for extracting the i-th Thai
V Thai Printing
1. Characteristics of Thai Letters
Thai letters have the following charac
teristics:
(a) They differ from each other in size as
290
M. SHIBAYAMA: Input/Output Methods for Thai
well as shape, for example, 1, W' 1,
ry, and l-(b) Several letters are overlapped in print
ing, for example, d is composed of
J and <v.
(c) Several letters are located above and
between adjacent letters, like lrn.(d) The printing positiors of the same
letter are sometimes different, like:!f L-
61J and 61J •
Sakamoto has proposed that such prob
lems can be solved for the printing of
almost all Asian and African letters by
dividing letter patterns into sub patterns,
comprising the basic character with special
information on the basic line and the
added character with information on its
basic point [Sakamoto 1979]. We have
adopted this idea in the design of a Thai
printing system and developed more con
trollable programs that allow any line
spacing with an integral number of dots by
adding functions for overlap control of the
character patterns, line position control for
each line, and vertical position control of
a character depending on the position of the
previous character. The schemes of these
controls are as follows:
(1) Discrimination of Basic Character
and A dded Character
Thai letters in combinations of consonant
+vowel or consonant+vowel+consonant
are composed of 6 regions centered on the
first consonant of a word, as shown in
Fig. 9. The characters located at (1), (2),
and (3) in Fig. 9 are categorized as bast·c
characters, and those at (4), (5), and (6)
as added characters. It is assumed that the
Tone(4)
Vowel(5)
Vowel Consonant Vowel(I) (2) (3)
Consonantor
Vowel (6)
Fig. 9 Positions of Consonants, Vowels,and Tones
sequence of appearance of characters must
be basic character before added character.
(2) Control of Basic Character
Basic characters have the attribute basic
l/ne, which shows the width of the character,
and which determines the horizontal
printing position of the letter relative to the
preceding letter. By employing this
scheme, the width of letters can be con
trolled closely, and printing with pro
portional spacing is possible.
(3) Control of Added Character
An added character has a basic point
rather than a basic Nne, which together
with information on its setting post'tion
serves to locate the character relative to the
basic Nne of the preceding basic charader.
The printing position for a Thai letter with
an added character is thus determined by
the attribute setting posi#on. This control
is the same as the dead key control of
a typewriter.
291
lapping the preceding vowel.
# FAIRS USER(KYOT02) ASIS
2. Example
Fig. 10 shows an example of the output
for retrieving a Thai bibliography on the
intelligent terminal connected to the host
computer. FAIRS 2 and FAIRS in
Figure are commands - for invoking the
information retrieval system on the host
computer. We have developed a program
such that the printing program runs mainly
by operating the dead key and vertical
position controls for each character, as
FAIRS ENDED
(4) Parental and Ch-ild Patterns and
the Line Position Control
Satisfactory printing quality of conso
nants can generally be achieved if a charac
ter is represented by 40*40 dots. However,
consonants like !J and J require 80*40
dots. We have therefore split the string
of Thai letters into three levels, and divided
the consonants and vowels represented by
80*40 dots into two patterns, here called
parental and child patterns. The printing
position for each pattern in aFAIRS2
Thai letter is decided by the at-tribute line position, which shows +FCA002A ENTER USERID-KYOT02
the level of the parental and child FAIRS> END
patterns.
TI~Jf1t-ir~fi~~]~f4i
jlIT1Ji1~_L __ ..'l1. ~ _. FAIR5-I (V10/L20)
RS> OUT
RS> SELECT THAI
FAIRS> RS LINE
WANNAO YUDEN
1979
262
T86005200006~ w ~ ~
1~~~RLUe4"U I ;ULU11 1 ~n~ ~~~m TU~"
D
p
Fig.l0 Example of Retrieval of Thai BibliographyUsing the Intelligent Terminal
T3
A2
BANGO
#1
4 FOUND
(5) Overlap Control
Every character pattern IS
synthesized by an OR operation
for all dots. This scheme is
necessary for such characters as
,j, iJ, and '1l~(6) Vertical Control of Added
Characters
To improve printing quality,
four tonal marks in the vertical
position need to be repositioned
if the previous letter has a vowel
like .,., ""', or .... In this case,
the settz"ng posz"tion of tonal mark
is shifted vertically upward by an
appropriate number of dots, and
the tonal mark is printed over-
292
M. SHIBAYAMA: Input/Output Methods for Thai
described previously.
VI Conclusion
The Thai text editor and the intelligent
Thai terminal designed have the functions
for inputting the Thai text by the Direct
Mapping, Transliteration, and Simplified
Transliteration Methods. Using these, we
are now developing a database and a com
puter concordance of the Three Seals Law.
The structure and characteristics of the
input methods have been compared by
measuring the speed of typing and learning
curves in the actual input work for making
a machine-readable Thai dictionary. The
Transliteration Method has the advantage
of requiring fewer keys on the keyboard
than the Direct Mapping Method, and if
transliteration from Roman spelling to
Thai letters is implemented on a host
computer connected to the terminal, an
ordinary teletype terminal can be used to
input Thai letters. Although the number
of key strokes required for text input will
normally be higher by the Transliteration
Method, this method was found to be more
readily operable by those accustomed to
inputting Roman spelling.
This scheme is applicable to design of
terminals and editors for other Southeast
Asian languages, like Laotian and Burmese.
We have also developed output methods
for Thai letters of high quality, and have
described the structure and characteristics
of methods for controlling the printing and
display positions of Thai characters on
a laser beam printer and a micro computer.
We are now working to develop a data-
base and a computer concordance of the
Three Seals Law at the Center for Southeast
Asian Studies and the Data Processing
Center, Kyoto University using this editor
and terminal.
Acknowledgtnent
Thanks are due to Prof. Yoneo Ishii of theCenter for Southeast Asian Studies and Prof.
Satoshi Hoshino of the Data Processing Center,Kyoto University who led us to this study, and toDr. Shigeharu Sugita of the National Museum of
Ethnology, who made many valuable commentsand provided the original text of the Three SealsLaw in machine readable form. Thanks are alsodue to Prof. Yasuyuki Sakamoto of Tokyo ForeignLanguages University, Prof. Tetsuya Katsumuraof the Research Institute for Humanistic Studies,Kyoto University, Mr. Yukio Hayashi of RyukokuUniversity, Mrs. Aroonrut Wichienkeeo of ChiangMai Educational University, Dr. SupamardPanichsakpatana of Kasetsart University, Dr. JittiPinthong of Chiang Mai University, and Dr.
Sukanya Nitungkorn of Thammasat University,who made many valuable comments on the transliteration table in the editor and terminal design.
This study has carried out as part of a project todevelop database for the Three Seals Law ofThailand in the Data Processing Center, KyotoUniversity, and has been supported by grants for
scientific research from the Japanese Ministry ofEducation.
Appendix
--Outline of Development of a Database anda Computer Concordance for the Three Seals Lawof Thailand--
The process for implementing on-line informationretrieval and a computer concordance of the ThreeSeals Law is shown in Fig. A-I. The process inFig. A-I advances into two flows: on the left,a database of Thai dictionary on the computer has
been made in order to verify all the words in thetext of the Three Seals Law. This can be used forthe studies of natural language processing, in other
words, morpheme analysis, syntactic analysis, andsemantic analysis as basic research into natural
293
Thai-ThaiDictionary
Original Textof
the Three Seals Law
(a) Input MainEntries
(a)
ExtractionofWord Phrase
'Word Units
(a)
Dictionary
Part of SpeechMeaning
(a) Resegmentation StatementUnits
NG Text
OK
Dictionary DatabaseBuilding
(b)
RetrievalConcordance
Application forNatural LanguageProcessing
(b)
Fig. A-I Outline for Developing a Database and a Computer Concordanceof the Three Seals Law
294
M. SHIBAYAMA: Input/Output Methods for Thai
Floppy
Disk
Disk
Unit
Other Countries
o
JapaneseLaser BeamPrinter
TSS Thai Emulator!
Thai Text Editor
MicroComputer
Host
Computer
System Configuration for Retrieving theDatabase of the Three Seals Law
Seals Law and the dictionary in the off-line status.The edited files are transferred into the hostcomputer using the file transfer function on themicro computer. Also, (b) shows that the ordinaryterminal and the intelligent terminal connected tothe host computer are used. The overall configuration of the system for building and retrievingthe database is shown in Fig. A-2. The databaseof the Three Seals Law and the dictionary of Thaiare stored on the disk unit and managed by theIRS in the host computer.
For building or retrieval, the database isaccessed by invoking the IRS from the terminal,and the results can be output to a Japanese laser
beam printer or to the terminal.The terminal shown in Fig. A-2 is made up
2 components: a micro computer and a TSS1)
1) TSS: Time-sharing system, namely, the modeof communication with the host computer onan interactive basis.
Tele-communicationLine 300/1200 bps
NTT Communication Network
Printer
Fig. A-2
FACOM
M-780
language understanding, for a question-answering system, and formachine translation involving Thai.
The right-hand flow shows theprocess of building a database fromthe original text of the Three SealsLaw, which had made such segmentation that a statement is dividedinto the words, provided by theNational Museum of Ethnology intoa database on the host computercapable of retrieving it throughthe on-line terminal, and it alsoshows that a computer concordanceis constructed simultaneously.
In making the database of theThai dictionary, the main entries(about 32,000 words) of the dic-tionary published by the Thai Royal
Institute [Photchana nukrom Thai NEC PC-98XA/1982] were input using the Thai text PC-9801editor. To complete the machine-readable dictionary, the main entriesneed to be supplemented with de·tails of part of speech, meanings,and other rules necessary for themachine processing of Thai. Finally, when the database of the Thaidictionary is complete, it can beused for syntactic analysis of Thaistatements.
In the building of the database and the computerconcordance of the Three Seals Law, resegmentation has first to be accomplished, which includesthe reading of text to confirm the existing segmentation and the input work using the Thai texteditor. Then each different word is extracted fromthe text file and the verified against the dictionaryon the host computer. If any mistakes are found ineither the dictionary or the text, feedback into theappropriate positions must be attempted. Afterthe corrective work, the text is segmented into thesentences to make the database retrieval more
efficient.By building the database using the information
retrieval system, IRS, as an application softwareon the computer, on-line information retrieval of
the Three Seals Law is possible, and computer
concordance also can be accomplished by usinga function in the IRS.
As shown in Fig. A-1, (a) shows that the Thaitext editor is used for editing the text of the Three
295
terminal emulator, which is operated on the microcomputer, and is connected with the host computer
through a telecommunication line with the speedof 300/1200 bps.2) The micro computer can, ofcourse, be used for editing the Thai text in theoff-line status.
References
Hartmann, J. F.; and Henry, G. M. 1983a. ThaiScript Computer-converted from a Precise,Pronounceable Transliteration for Bibliographic Management. Bulle#n oj Commz'tteeon Researck Materials on Soutkeast Asz'a(CORMOSEA) 11(2).
----. 1983b. The Processing of Thai Language Text Using a Personal Computer.Bulletz'n of Commz"ttee on Researck Materz"alson Soutkeast Asz'a (CORMOSEA) 11(2).
Ishii, Y. 1969. Sanin Hoten ni Tsuite [Introductory Remarks on the Law of Three Seals].Tonan Ajz"a Kenkyu [Southeast Asian Studies]6(4): 155-178.
Murayama, N. 1982. 2 Sutorooku-ho [2 StrokesMethod]. Jyoko Syori [Information Processing] 23(6).
Photchana nukrom Thai. 1982 (Thai 2525).Ckabap Ratckabandz't-satkan. Krungthep:Samnakphim Aksonchaoenthat.
Sakamoto, Y. 1979. Ajia Afurica Gengo noKonpyuuta Syori [Computer Processing forAsian and African Languages]. Jyoko K anri[Information Management] 22(7).
Shibayama, M.; Sugita, S.; and Ishii, Y. 1984.
2) bps: Bits per second, namely, the unit ofcommunication rate.
296
Pasokon ni yoru Taigo Tekisuto no Syon[Processing of Thai Text Using a PersonalComputer]. Daz" 28 Kaz' Jyoko Syorz" Gakkaz"Zenkoku Taz'kaz" Ronbun-syuu [Proceedingson Japan Information Processing Society28th National Conference].
----. 1985. Romaji Hyoki ni yoru Taimojino Nyuryoku Hoshiki [Input Methods forThai Using Roman Spelling]. Daz" 30 kaz"Jyoko Syorz" Gakkaz' Zenkoku Taz'kaz" Ronbunsyuu [Proceedings of JIPS30].
Shibayama, M. ; and Hoshino, S. 1986a. Implementation of an Intelligent Thai ComputerTerminal. Journal of Injorma#on Processz'ng8(4): 300-306.
---. 1986b. Taiji Jisyo no Nyuryoku-ho toNyuryoku-tokusei [Methods and Characteristics of Thai Dictionary Inputs]. Daz' 33 kat'Jyoko Syorz" Gakkaz" Zenkoku Taz'kaz" Ronbunsyuu [Proceedings of JIPS33].
----. 1987. A comparative Study of theCharacteristics of Input Methods for Thai.In Proceedz'ngs of tke Regz"onal Symposz"umon Computer Scz'ence and Its Applz'catz'on.Thailand: NRCT. pp. 19,1-19,18.
Sugita, S. 1980. Text processing of Thailanguage: The Three Seals Law. In KagakuKenkyuu-hz' Skz'ken Kenkyuu Sez"ka Hokokusyo: Jz"nbun Kagaku Kenkyuu Shz'en notameno Konpyuuta Apurz'keesyon no Kaz'katsu[Development of Computer Application forAssisting the Studies of Cultural Science],edited by Tadao U mesao, pp. 122-129.National Museum of Ethnology.