4
European Conference on Speech Technology Edinburgh, Scotland, UK September 1987 ISCA Archive http://www.iscaĆspeech.org/archive European Conference on Speech Technology, Edinburgh, Scotland, UK, September 1987 1273 A GRAPH-ORIENTED APPROACH TO THE GRAPHEME-TO-PHONEME TRANSCRIPTION OF ITALIAN WRITTEN TEXTS. P. ABSTRACT Taking into account the implicit phonotactic constraints of the Italian, a graph-oriented computational approach to the grapheme-to-phoneme transcription of written texts is described. The system is based on the mathematical theory of Finite States Automata, generalized and augmented to obtain a simple syntax-directed translation schema. The rules are totally based on the pure orthographic form of the words, and no level of grammatical knowledge of their classes is considered. A particular set of exceptions is often associated to the whole set of rules and, owing to the modular quality of the automaton, an optimized rule-exceptions check mechanism is built up, albeit in a preliminary and incomplete form. INTRODUC TI ON Computers cannot synthesize reasonable-quality sentences from phonemic information alone because the acoustic features of human utterances are heavily influenced by syntactic, semantic and pragmatic factors (ref 1). However, if we focus our attention on speech synthesis from written texts, the first problern encountered is that of grapheme-to-phoneme conversion. This paper describes a graph-oriented transcriber, based on the theory of Finite States Automata (ref 2), which should be considered as the basic component of a more complex synthesis-by-rule program. OUTLINE OF THE GRAPHEME-TO-PHONEME TRANSCRIPTION RULE SYSTEM The whole set of transcription rules for Italian (ref 3) is entirely based on pure orthographic forms. The automaton is supposed to accept stressed words as input, so a preliminary analysis of the automatic assignment of lexical stress has to be considered (ref 4). The graphemic input words are indicated by capital letters to distinguish them from the outputted phonemic words which follow the International Phonetic Alphabet (IPA). The transcription procedure is divided in two levels. The first one (Tl) corresponds to a "broad" or "phonemic" transcription, while the second is closer to an "allophonic" or "phonetic" conversion (Tz). In Tz , three allophones of n two semivowels Ii/ and (to distinguish from the semiconsonats /j/ and /w/) and long stressed vowels (the vowel's symbol is followed by a semicolon) are added. The stress mark ('J precedes a stressed syllable in T1 only, while in Tz it is directly connected *Centro di Studioperle Ricerche di Fonetica (C.N.R.). Via G. Oberdan, 10 - 351ZZ Fadova (Italy)

ISCA Archive · represented by its "state transition diagram" (a directed graph with nodes for the states connected by arcs labelled with input symbols causing transition and possibly

Embed Size (px)

Citation preview

Page 1: ISCA Archive · represented by its "state transition diagram" (a directed graph with nodes for the states connected by arcs labelled with input symbols causing transition and possibly

������������������� ���������������

���������� ����������

������������

ISCA Archive����� !!!"�#��$#�����"��� �����%�

������������������� �������������������������� ����������� ������������ 1273

A GRAPH-ORIENTED APPROACH TO THE GRAPHEME-TO-PHONEME TRANSCRIPTION OF ITALIAN WRITTEN TEXTS.

P. COSI-1~.

ABSTRACT

Taking into account the implicit phonotactic constraints of the Italian, a graph-oriented computational approach to the grapheme-to-phoneme transcription of written texts is described. The system is based on the mathematical theory of Finite States Automata, generalized and augmented to obtain a simple syntax-directed translation schema. The rules are totally based on the pure orthographic form of the words, and no level of grammatical knowledge of their classes is considered. A particular set of exceptions is often associated to the whole set of rules and, owing to the modular quality of the automaton, an optimized rule-exceptions check mechanism is built up, albeit in a preliminary and incomplete form.

INTRODUC TI ON

Computers cannot synthesize reasonable-quality sentences from phonemic information alone because the acoustic features of human utterances are heavily influenced by syntactic, semantic and pragmatic factors (ref 1). However, if we focus our attention on speech synthesis from written texts, the first problern encountered is that of grapheme-to-phoneme conversion. This paper describes a graph-oriented transcriber, based on the theory of Finite States Automata (ref 2), which should be considered as the basic component of a more complex synthesis-by-rule program.

OUTLINE OF THE GRAPHEME-TO-PHONEME TRANSCRIPTION RULE SYSTEM

The whole set of transcription rules for Italian (ref 3) is entirely based on pure orthographic forms. The automaton is supposed to accept stressed words as input, so a preliminary analysis of the automatic assignment of lexical stress has to be considered (ref 4). The graphemic input words are indicated by capital letters to distinguish them from the outputted phonemic words which follow the International Phonetic Alphabet (IPA). The transcription procedure is divided in two levels. The first one (Tl) corresponds to a "broad" or "phonemic" transcription, while the second is closer to an "allophonic" or "phonetic" conversion (Tz). In Tz , three allophones of n (/~/,/n/,1~/), two semivowels Ii/ and /~/ (to distinguish from the semiconsonats /j/ and /w/) and long stressed vowels (the vowel's symbol is followed by a semicolon) are added. The stress mark ('J precedes a stressed syllable in T1 only, while in Tz it is directly connected

*Centro di Studioperle Ricerche di Fonetica (C.N.R.). Via G. Oberdan, 10 - 351ZZ Fadova (Italy)

Page 2: ISCA Archive · represented by its "state transition diagram" (a directed graph with nodes for the states connected by arcs labelled with input symbols causing transition and possibly

������������������� �������������������������� ����������� ������������ 1274

to the corresponding stressed vowel (•); moreover the geminated consonants are indicated by doubling their symbols in T1, while in Tz they are represented by the same lengthening mark used for long stressed vowels. The most commonly utilized formalism describing the rules is the so-called context-dependent one (ref 5). A rule is indicated in the following way:

w --> X I y z (1)

which means that W becomes X if it appears between Y and Z. W, Y and Z are elements belanging to the input alphabet (i.e. graphemes), and X is an element belanging to the output alphabet (i.e. phoneme). In Table 1, the whole set of "first-level" rules and the "second-level" rule referring to grapheme N are formalized context-dependently. To produce a non-ambiguous grammar it is necessary to intoduce new rules for the graphemes S and Z to cover all the permissible Italian phonotactic possibilities. For the time being, two maverick phonemes (s) and (z) are added in the cases not taken into consideration by the rules expounded so far.

1) A -+ I a/ 2) [~] -+ ~=~J

1

5) GL -+llgll I %_I 1~1<1 @_I

I I< I *_I /gll

6) GN -+~~;~ I @_@ u 7) [~]-+ ~:~~] I * -~] 8) Z -+)/ts/ I %_[1"'symbol of 2""syllable • gl

/ldzl %_* ~

SYMBOLS: ~ I [ J ~ logical or ( ) optional parallel choice

x1 • "'FICE,"'GGIO, "'CCIO, "'VOLE·,"'SCO, "'SSA, "'TTO, "'TTA, "ZZA, "SE, "RE,"'~

Xz • "'IO,"IA,"'CE,"'NE,"NA,""RE,"'SO,"NDO,"'GNO,"'GNA

"' • word stress * • any symbol

@ • any vowel $ • ~ny consonant

I • any voiceless consonant & • any voiced consonant

_\I z I I *«_& I

2A level rule

N -+11'1 I ·-)!I lnl I *_)T,D,Z,N~

) S,L,R ~

\

lft/ I * -~~ palat.~

llol I * -~~ occlu1 % • word initial boundary ~ • word final boundary

Table 1. Context-dependent formalization of the transcription rule system.

GRAPH-ORIENTED IMPLEMENTATION

The transcriber is seen as a set of several automata grouped tagether to form a complex deterrninistic finite state automaton which is conveniently represented by its "state transition diagram" (a directed graph with nodes for the states connected by arcs labelled with input symbols causing transition and possibly with output symbols caused by the transitions). The graph is "augmented" by controls and actions which can arise depending on the different situations within the "walk" in the graph. The mechanism of the transcription can be considered as that of walking across the graph from an initial to a final state, and the reute followed directly depends

Page 3: ISCA Archive · represented by its "state transition diagram" (a directed graph with nodes for the states connected by arcs labelled with input symbols causing transition and possibly

������������������� �������������������������� ����������� ������������ 1275

on the different types of nodes and situations in the graph. All the rules are organized in order to take into account only the context on the right. In fact,the graphemic input string (IS) is inspected while it reads its symbols on the right of the current pointer position which is continuously increased, depending on the successively "eaten" symbols. Figure 1 shows the graph's bleck diagram. Apart from the word-final-symbol rule (the walk reaching the final node), all the rules are represented by a walk ending in the activating node (AC). The word-initial-symbol rule (some rules consider ward initial conditions and should be treated in a different way from the other rules in a left-to-right approach) can easily be recognised as the walk connecting the initial node (IN) with AC, while the other word-embedded-symbol rules (WESR) are represented by the graphs which start and end in AC. Thus AC is the only node which has an output string (OS) connected to its input links, a feature which means that a rule has been tested and applied. All the nodes are represented as shown in Fig. 2 and are defined as "reading" (m~O) and "non-reading" (m=O). They differ because the former need to read m IS symbols (Xi) to proceed in the graph, while the latter only need to utilize and possibly "consume" symbols previously read. When a rule is activated, the symbols which the corresponding module needs to verify are read by the first reading node (see rule-nodes or Rx in Fig. 1, referring to WESR). These symbols are successively utilized by the next non-reading nodes of the automaton until the whole set of alternatives is examined to complete the rule. This means outputting a string (Yi) , repositioning the IS pointer (ni represents the "backtraking" IS output pointer,i.e. the difference between the number of IS symbols utilized by the rule and the number of those effectively transcribed) and, finally, restarting the procedure with a return to AC. The exceptions control bleck (ECB) in Fig. 2 is very important and is used to verify whether the current input ward forms part of the exceptions to the current applied rule, in order to transcribe it by table look-up.

Fig. 1. Graph's bleck diagram.

Two examples of sub-automata are given in Fig. 4. The first implements the word­initial-symbol rule (part of rules 4,5,8 in Table 1) tagether with some exceptions

VCCJ:Z'TIONS -~ ,.. ... COHTftOL ~ I ll.OCJC ~

I I

I I

I ,r _________ J

Fig. 2. Nodes' symbol.

and the automaton zone closed by the dashed line checks whether the secend

Page 4: ISCA Archive · represented by its "state transition diagram" (a directed graph with nodes for the states connected by arcs labelled with input symbols causing transition and possibly

������������������� �������������������������� ����������� ������������ 1276

syllable of a ward begins (rule 8) with a voiceless consonant. The secend refers to grapheme B. Finally, Table 2 illustrates some results of the transcription procedure applied to a reduced set of Italian words.

INPUT OU':t'PYT

PHONE!UC PHON!T1C

T~ 'L'2

A""CCUA I" <lkkllll&/ /akz~o~&/ . RI""VA l"riv&l /ri&v&/

CA"'RRO l"k&rro/ .

/koarzo/ SA"'LE , ... ,., . 1•••1•1 GRO ... SSO I"Qr ;,aao/

. lt;Jr i••ol

!18A"'Gl.l0 l"zb&.<.<ol /&b&.<to/ Vl"'TA /"vit&/

. /vi1ta/

1'1A"'TTO l"lll&tto/ .

l••tao/ FU"'I"IO 1• fu.a/

. /futaa/

BU""IO /"buJo/ .

/butJo/ FIU""I"IE /"fJu••'

. /fjUI-/

8UO"'NO /"b~o~::.no/ .

IbN ;,ano/ CAVA""LLO

. lk•" v&ll o/ /kav&la o/ .

,.A"'NCIA l"p&ntJ &/ /p:o1 tJ &/ CO"'N8CIO I" k ::.n Jol lki"Jol IIA""'NCA /"l:aanlca/ /l:a!!l k&/ TA"'NGO l"tancao/ /t!.!J90/

Fig. 4. Two automaton examples. Table 2. Same italian words.

CONCLUSION

An advantage of the graph-oriented approach to phonemic transcription is its relative simplicity of implementation owing to the "modular" nature of the complete automaton. Each subgraph corresponds to one particular subset of the overall rule system and can be easily tested, independently of the rest of the total automaton. Another advantage is that it implements an "optimized" rule exceptions verification mechanism albeit in a preliminary form. In fact, it 'is possible to connect an exceptions file, to each "rule node" in the graph, to each "subrule node" or, indeed, to any node, so breaking down the mass storage memory required to contain the exceptions and reducing the nurober of those which will be checked walking across the graph, evidently speeding up the transcription procedure.

REFERENCES

1) D.H. Klatt, IEEE ASSP, Vol. 24, N. 5, pp. 391-398 (1976). 2) A V Aho, J D Ullman, The Theory of Parsing, Translation and Compiling

(P.H.I., NJ, 1972). 3) Z. Muljacic, Fonologia della Lingua Italiana (Mulino, Bologna, 1972). 4) R Delmonte, in Linguistica Computazionale (CLESP Padova, 1983) p 101. 5) N Chomsky, M Halle, The Sound Pattern of English (H. R ... , NY, 1968).