Download pdf - Unchanged Cognates as a Criterion in Linguistic Subgrouping

Linguistic Society of America

Unchanged Cognates as a Criterion in Linguistic SubgroupingAuthor(s): Bh. Krishnamurti, Lincoln Moses and Douglas G. DanforthSource: Language, Vol. 59, No. 3 (Sep., 1983), pp. 541-568Published by: Linguistic Society of AmericaStable URL: http://www.jstor.org/stable/413903 .

Accessed: 12/11/2014 07:51

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

Linguistic Society of America is collaborating with JSTOR to digitize, preserve and extend access to Language.

http://www.jstor.org

This content downloaded from 74.44.203.226 on Wed, 12 Nov 2014 07:51:06 AMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=lsa

http://www.jstor.org/stable/413903?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


UNCHANGED COGNATES AS A CRITERION IN LINGUISTIC SUBGROUPING

BH. KRISHNAMURTI LINCOLN MOSES DOUGLAS G. DANFORTH

Osmania University Stanford University Wang Laboratories If a sound change has lexically diffused without completing its course, one finds that

among the lexical items qualified for the change, some have already changed (c), others have remained unchanged (u), and still others show variant forms (u/c). When such a change has affected a group of genetically related languages, the consequent comparative pattern u-ulc-c can be used to set up subrelations among languages. In this paper, we draw on data from six languages belonging to the South-Central subfamily of Dra- vidian, with reference to an atypical sound change called 'apical displacement'. There are 63 etymologies which qualify for the study. A total of 945 possible binary-labeled trees fall into six types for the six languages under study. In terms of our postulates, that tree is the best which scores the lowest m, i.e. the minimum number of independent instances of change needed to account for the u-c-o (o = no cognate) pattern of a given entry. Each of the 63 entries has been applied to the possible 945 trees, and the trees have been scored for the value m by computer. The one tree which scored the lowest (71 points) is identical with the traditionally established tree for these languages. This paper shows that: (a) one shared innovation is sufficient to give genetic subrelations among languages, within the framework of the theory of lexical diffusion; (b) unchanged cognates are as important as changed cognates in giving differential scores for possible trees; and (c) the notion of shared innovation can be further refined within the theory of lexical diffusion.*

1. INTRODUCTION. In the standard theory of historical linguistics, subrelations among languages belonging to a family are established on the basis of 'shared innovations' (Dyen 1953:580-82, Hoenigswald 1966:7). Under ideal conditions, a family tree diagram can be constructed to reflect subrelationships among a group of genetically related languages: the more inclusive innovations account for branchings at higher nodes, while the more exclusive innovations correspond to branchings at lower nodes. A tree diagram also implies a relative chronology, with higher branchings representing older changes.' The short- comings of both the comparative method and the tree diagram are all too well- known to historical linguists. (For a lucid discussion, cf. Bloomfield 1933,

* When the central idea of this paper was conceived and developed, Krishnamurti and Moses were Resident Fellows at the Center for Advanced Study in the Behavioral Sciences, Stanford (1975-76). Danforth was a statistics advisor at the Center. We all gratefully acknowledge the facilities provided for our collaborative work at the Center. We are indebted to the following for their useful and encouraging comments on an earlier draft of this paper: M. B. Emeneau, G. B. Kelley, Hans H. Hock, George Cardona, Paul Kiparsky, Chin-Chuan Cheng, and R. M. W. Dixon. William S-Y. Wang gets a major share of gratitude for his numerous insightful comments, which helped us in the revision of this paper.

'This is the point essentially implied by Hoenigswald (1960, ?13.2.4) when he says: 'Each different reconstruction represents the proto-language of a subfamily. The component languages of those language pairs which yield identical reconstructions belong to one subfamily. If a language is thus found to belong to two subfamilies, that subfamily which is reconstructed from the smaller number of languages is, in turn, a subfamily within the subfamily reconstructed from the larger number of languages.'

541



LANGUAGE, VOLUME 59, NUMBER 3 (1983)

??18.9-13.) However, while their utility has not been rejected, no alternative procedure has yet been developed which captures subrelationships among the members of a language family in a more rigorous way.2

In linguistic subgrouping, it is the presence or absence of an innovation (say, a sound change resulting in a phonemic merger or split) that is taken into account, rather than the extent to which it has affected cognate morphs in the related languages. To quote Hoenigswald (1966:8):

Brugmannian innovations are not for counting-the question, in principle, is whether they are at all present or entirely absent in a given set of descendant languages. Of course, this amounts to saying that if languages A and B share an authentic "innovation" as against language C, then there can be none linking C and B against A. Where this nevertheless happens, as it frequently does, it indicates the inadequacy of the family tree as a device to depict a language relationship.'

Recent studies have conclusively shown that at least some sound changes are lexically gradual; i.e., lexical items which fulfill the structural conditions of a sound change are not all affected by it at once (Chen & Wang 1975). A commonly initiated sound change can spread across a set of related languages, engulfing more and more lexical items which qualify for the change; and the process may continue for several centuries. The number of innovative cognates (i.e. those affected by a given sound change) which any two languages share can then be taken as a measure of their relative distance. For instance, Lan- guages ABC may all show evidence of a shared innovation (a certain sound change), but AB may share more cognates-with-change; this would give us a subgrouping ((AB) C) as preferable to the possible subgroupings ((AC) B) and ((BC) A).3

1.1. In a recent study of the areal and lexical diffusion of three sound changes in the South-Central group of Dravidian languages-viz. Gondi (G), Konda (K), Kui (Ku), Kuvi (Kv), Pengo (P), and Manda (M)-Krishnamurti 1978

2 The method of lexicostatistics (also called glottochronology), proposed by Morris Swadesh, was much discussed in the 1950's as an alternative to traditional subgrouping. It is based on the hypothesis that, in any language, basic vocabulary is lost (or replaced) at a constant rate. The time- depth which separates any two languages of a family is calibrated by comparing the degree of loss in basic cognate vocabulary. In this theory, only the presence or absence of a 'true cognate' in a language matters, rather than how close a word is in form to cognates in the sister languages. Several aspects of this theory have been questioned, and it seems not to be as popular in historical studies now as it was two decades ago. For further details, see Gudschinsky 1956 (in Hymes 1964, with the latter's reference note).

3 Here the assumption is that, after the ancestor of AB had separated from C, the two branches would have different lexical schedules in the implementation of the inherited sound change. Con- sequently, the shared innovative cognates between AB would be more numerous than between either one of them and C. -Hsieh (1973:71) introduces the notion of 'diffusion overlapping' to represent 'the sharing or overlapping of the phonological forms in cognate items across dialects', and he proposes to utilize 'the degrees of diffusion overlapping with respect to a commonly initiated sound change' to measure 'the relative genetic closeness' of three or more dialects. (A discussion of this concept is also given by Krishnamurti 1978:12.)

542



UNCHANGED COGNATES

shows that a tree structure reflecting the hierarchical relationships of these languages can be constructed, taking into account the number of cognates- with-change that each language has shared with the other five. The resulting tree diagram, shown here as Figure 1, perfectly matches the traditional diagram based on a number of phonological and morphological isoglosses (see Appendix I).

Proto- Gondi ........... Manda

Gondi Konda Kui Kuvi Pengo Manda

FIGURE 1. A tree diagram of the South-Central Dravidian languages.

An exact replica of Fig. 1 has also been produced with the aid of a computer program developed by Roy D'Andrade of the University of California, San Diego,4 using as input the numbers of cognates-with-change that each language has shared with the other five; it is given as Table I (overleaf; data taken from Krishnamurti 1978, Table 8).

2. A NEW APPROACH. The method of linguistic subgrouping proposed in the above studies mainly concerns itself with changed cognates, rather than with the cognates which remain unchanged in different languages. Within the framework of the theory of lexical gradualness of sound change, we find that consideration of unchanged cognates also has an important role to play in linguistic subgrouping. We introduce the postulate below, and we find that adherence to

4 It was through Matthew Y. Chen that Krishnamurti came to know in January 1977 about the 'U-statistic hierarchical clustering', a computer program developed by D'Andrade (cf. D'Andrade 1978). Chen used his good offices with D'Andrade to run the program, using as input the numbers given in Table 8 of Krishnamurti 1978; the resulting tree diagram is identical with that given by Krishnamurti. In a personal communication (2 March, 1977), Chen wrote to Krishnamurti:

'The above tree (I'll refer to it as Diagram A) makes 40 predictions, 37 of which proved correct, and 3 wrong. The incorrect predictions have to do with the peripheral dialects, namely Gondi and Konda. Diagram A predicts that Kuvi should be closer to Konda than it is to Gondi. This turns out to be incorrect, since Kuvi shared 20 innovative items with Konda but 22 with Gondi. This is mistake 1. Diagram A also predicts that Konda should be closer to Manda than it is to Gondi. Again this is wrong, since Konda shares only 9 items with Manda but as many as 16 with Gondi. Finally, on the basis of Diagram A, Manda should be closer to Konda than it is to Gondi, an inference contradicted by the fact that whereas Manda shares 9 items with Konda (as noted before), it shares 10 with Gondi. Nevertheless, 37 correct predictions out of 40, that is 0.9250 correct, is an unusually high score of correct prediction, as Dr. D'Andrade commented.'

543




GONDI KONDA KUI Kuvi PENGO Konda 16 Kui 18 18 Kuvi 22 20 88 Pengo 11 19 48 49 Manda 10 9 40 42 57

TABLE 1. Number of shared cognates-with-change.

it implies that certain trees 'fit' more naturally with observed distributions of changed and unchanged forms of cognates. The key idea will be made clear from the examples.

POSTULATE I. A lexical item may change its form (phonemic content and/or arrangement) after it has undergone a sound change, but a changed item will not revert to its unchanged form through a subsequent sound change.

This is based on the famous principle of the irreversibility of the effects of sound change, particularly phonological merger (Hoenigswald 1960, ?11.4). In the case of literary languages, it is possible that, after a sound change had completed its course, some of the changed lexical items might be replaced by corresponding unchanged ones borrowed from an earlier literary variety-pre- ferred in certain formal styles, and used by certain individuals in speech and/or writing.5 Such items must be identified and rejected by this postulate. Even in non-literary languages, comparativists can use several linguistic clues to detect borrowed unchanged items; these tend to be less systematically distributed than the changed forms, and also fewer in number. The co-occurrence of both changed and unchanged forms in synchronic usage is handled by Postulate II (see below, ?3.1).

The effect of Postulate I is that, when a lexical item is definitely known as 'changed' for a given language, it cannot be treated as 'unchanged'. We represent this postulate schematically as follows (> = becomes; > = does not become): u > c, c > u. In the light of this postulate, we take an example and

5 Wang (p.c.) has drawn our attention to the fact that educated Swedish of Stockholm has recently revived, in formal styles, a word-final -d which had been lost earlier; this is a possible exception to Postulate I. Janson 1977, who has discussed this change, says that, between the 14th and 18th centuries, there is evidence for loss of word-final /d/ ([d] after 1, n, but [d] elsewhere) 'in the dialects of Southern Sweden'. The loss of the dental fricative [d] (spelled -dh) was very extensive, and spread to 'Svealand, the area where Stockholm is situated' (255-6). 'From the 18th century on, however, evidence in the written language for deletion of -d tapers off. Actually, the dentals must have been re-introduced in the endings -ad and -at in the language of the educated class' (256). Here, the revival or retention (the point is not clear from the paper) of word-final -d is apparently motivated by educated speakers of Stockholm on sociological and sociolinguistic grounds. A some- what parallel case occurs in Modern Telugu. Late Old Telugu (ca. 12th c. A.D.) had a sound change Cr- > C- (C = voiced or voiceless stop), e.g. prata > pata 'old', krotta > kotta 'new', etc. Modern

Telugu has only the changed forms in both speech and writing. However, a few conservative writers prefer to use the Cr- forms in writing-and sometimes even in formal speeches, invoking ridicule. These are not illustrations of changed forms reverting to the unchanged stage by a later sound change.

544



UNCHANGED COGNATES

compare the possible trees that derive from the configuration of unchanged and changed lexical items in the six languages. (The reference numbers indicate entries in Burrow & Emeneau's DED(S) (1961, 1968); for detailed explanation, see Appendix I, below.)

2.1. The cognate group listed under entry 4524 has the following pattern of unchanged (u) and changed (c) cognates (o = no cognate): Gondi, u; Konda, o; Kui, c; Kuvi, c; Pengo, c; Manda, c. Consider Figure 2, which shows

G K Ku Kv P M U 0 c c C c

M P Kv G Ku K C C c U c O

Ku G Kv M P K G Kv Ku P M K c u c c c o U c C C c o

u = unchanged (cognate ), c = changed (cognate)

FIGURE 2. Four possible trees for DED(S) 4524.

four possible trees for word 4524. In tree 1, the entire pattern could be explained by a single instance of change at the place marked x, followed by inheritance of that change in the four branches representing the four languages Ku, Kv, P, and M. Since one instance of change will suffice to account for the u-o-c pattern, we score this tree with m = 1, where m = the minimum necessary number of instances of change for any proposed tree. But in Tree 2, this group implies a score of m = 4. This is because, from the presence of u in G and o in K, it is not possible to posit a shared change for the four languages which show the change; thus each c must be an independent instance of change in each of the languages M, P, Kv, and Ku which could be posited at the points marked x in the diagram. On grounds of parsimony of explanatory elements, we find Tree I preferable to Tree 2 to account for cognate group 4524. In Tree 3, m = 2, since a change at Ku, plus another below the second node repre-

545




senting common inheritance of change in Kv, M, and P, will account for the pattern observed, with posited loss in K. For Tree 4, m = 2 again, since a change at Kv, plus another between the second and the third nodes, implying common inheritance, will account for all the observed changes.

From the example, it is clear that any etymological group can be used to provide a score m. Large values of m denote 'strained explanations', which require supposing many independent parallel changes. The idea of this paper is to allow all the cognate words appearing in two or more languages to be applied to every one of the logically possible trees-computing m in each sub- application. Then trees with low values of m are preferable to those with high values.

2.2. The further elaboration of this paper involves several tasks: (1) We must establish that only those cognates which appear as c in at least

two languages, and as u in at least one, are capable of discriminating among trees.

(2) We must establish that the problem can be narrowed, without loss, to those trees which always have just two branches at any node; and we must then exhibit the six types of such trees, and the 945 distinct possible trees that they generate.

(3) We must demonstrate the method of examining and scoring the 63 x 945 tree-word pairs.

(4) Data must be presented on the trees which emerged as the strongest candidates.

(5) Results must be discussed and interpreted.

3. IMPLEMENTATION. The tasks listed above correspond roughly to the sec- tions which follow.

3.1. IDENTIFYING THE DISCRIMINATING ITEMS. The status of a cognate in a given language can be any of the following.6

(1) unchanged u (2) changed c (3) variant ulc (4) non-occurrence o

For the moment, we ignore the third possibility, taking it up at the end of this section. The fourth possibility gives us no difficulty: where a cognate is absent, there is no information to consider; either a cognate has not been recorded,

6 In this paper, wherever we use the word 'changed' with respect to the members of a cognate group, we mean that they have undergone the same kind of change through the application of a specific phonological rule. Three phonological rules have produced different kinds of change in the 63 cognate groups considered here, of which Rules 1' and I" are mutually exclusive (complementary); i.e., a set of cognates undergoes either 1' or I", but not both. Rule 2 applies only to the output of Rule 1". The consequences of application of Rules I' and 2 are not the same; e.g., if the cognates of languages ABC have undergone Rule I", and within these the items of BC have undergone Rule 2, then the items of BC have undergone two changes each, as compared to the cognates in A. For further details, see Appendix 1.

546



UNCHANGED COGNATES

or it has simply been lost. In either case, we cannot assign it the value of u or c; i.e., its value is undetermined.

We now turn to the u's and c's. A group which appears only in the c form (with or without o) fits all trees equally well: a single change followed by inheritance in some languages, and extinction in others, fits any tree. Similarly, a cognate group with only u members will also fit any tree. Such cognate groups have no power of discrimination. If cognates occur in only two languages, of which one is c and the other u, all possible trees give the same score (see ?3.2, below). Thus the discriminating cognates are just those which appear in at least one language as u, and in at least two others as c. The data studied by Krish- namurti 1978 contain 63 entries which meet this requirement. Table 2 (overleaf) gives the u-o-c distribution of these items, with their entry numbers from DED(S).

But what shall be done with those items which appear in variant forms in some language(s)? That is, how shall we treat the ulc cases? Here we can refer to Figure 3, in which the three trees differ by having the upper branch marked

(a)

c

(b)

(c)

U C

u C u/c FIGURE 3. Status of u/c vs. u and c.

with c, u/c, or u. We shall analyse them using a second postulate, as follows: POSTULATE II: u/c should be counted as c.

547




KONDA

U

U

0

C

C

U

C

U

U

U

U

C

U

C

0

U

0

0

0

U

C

0

0

U

U

0

0

0

U

U

0

U

Kui o

c

c

c

c

c

u

c

c

c

0

0

c

0

c

u

c

c

0

c

c

c

c

c

c

c

c

c

c

c

c

c

Kuvi

0

c

c

c

c

c

u

c

0

c

c

0

0

c

c

u

0

c

0

c

c

c

0

c

c

c

c

c

c

c

c

c

PENGO

C

U

0

c

c

0

C

C

C

c

c

C

C

c

0

C

c

0

c

0

0

c

C

C

0

0

0

0

c

0

U

0

TABLE 2. Entries from DED(S) with u-o-c pattern for six languages.

The implication of this is that ulc is an intermediate stage in the process u > c. Thus u > u/c > c, or u > u/c >0; therefore, u > ulc > u.7

Postulate II is needed to avoid triviality. If u/c could either revert to u by loss of c, or advance to c by loss of u, then every tree could be scored m =

7 Labov (1972:274-83) cites eight studies of speech communities which show evidence of sound change in progress. In all these, he shows that 'variability' of unchanged and changed segments (conditioned by sociolinguistic factors) is noticed before the categorical implementation of all sound change. Thus, in Swiss French, older-generation speakers (60-90 years) use t, middle-generation speakers (30-60 years) use t - y, and those under 30 use y. Also note the perceptive statements of Weinreich et al. (1968:149-50):

'We argue that, while linguistic change is in progress, an archaic and an innovating form co- exist within the grammar: this grammar differs from an earlier grammar by the addition of a rule, or perhaps by the conversion of an invariant rule to a variable rule ... We would expect social significance to be eventually attributed to the opposition of the two forms. At some point the social and linguistic issues are resolved together; when the opposition is no longer maintained, the receding variant disappears. This view of change fits the general observation that change is more regular in the outcome than in the process.'

548

ENTRY

197 240 508 592 593 694 775 834a

S265 S290 S299 S407 S539 S642 S772 S787 S802 S877 S878

929 943

1090 1136 1142 1160 1311 1382 1485 1511 1538 1702 1782

GONDI

C

U

U

U

U

U

0

U

U

0

0

U

0

U

U

U

U

U

U

U

U

U

U

0

U

U

U

U

U

U

U

U

MANDA

c

C

C

0

0

0

0

c

0

C

0

0

c

0

0

c

0

0

c

c

0

c

0

c

U

0

0

0

c

0

0

0



UNCHANGED COGNATES 549

ENTRY GONDI KONDA Kui Kuvi PENGO MANDA

1784 U u c c c c 1787 U u c u c c 1949 u c c c c C 1986 u c o c c c 2102 u u c c C c 2529 u u c c o o 2546 u u c c c c 2655 u u c c o c 3046 u u c c c c 3255 u u c c c c 3262 u u u c c o 3286 u u c o c c 3296 u u c c c c 3446 u o c c c c 3537 u u c c c o 3613 u u c c o o 3619 u u c c c c 3856 u c c c o o 3865 u u c c o o 3897 c u c c c c 3899 u o c c o o 3901 u u c c u c 3988 u u c c o o 4071 u u c c c o 4096 u o c c o o 4169 u u c c o o 4327 o u o u c c 4347 u o c u c c 4438 u u c c u u 4459 u o c c c o 4524 u o c c c c

TABLE 2. (Continued)

1. This trivial result could be obtained as follows: label every non-terminal node as u/c, and then allow either u or c to be inherited-as convenient-at the terminal nodes. Postulate II prevents this, by barring the loss of c from u/c.8

Using both postulates now, we score the trees in Fig. 3 as follows: TREE (a): The lower node must be u because of the u beneath it. Then the

node above it must also be u. Therefore, both c and the c in u/c require the changes to operate independently, and m = 2.

TREE (b): The analysis is as above, and again m = 2. TREE (C): The upper node must be u. The lower node should then be u/c,

with u/c > c on the left. Thus one change at the lower node (u > u/c) will suffice, and m = 1.

8 Postulates I and II are intrinsically related; they can be collapsed into a single postulate which may be represented schematically as u > (u/c) > c, (u)/c > u. However, for the sake of clarity in discussion and application, we prefer to give separate postulates as above.




We observe that these three trees are scored exactly as if u/c were replaced by c, as a consequence of Postulate II.

3.2. COUNTING AND LISTING THE TREES. We have restricted our consideration to those trees which, at every node, have only two branchings (binary trees); and this has simplified our task. If, in fact, some three-fold (or higher-order) branching occurs in a natural language family, then the fact is detectable, and such a tree is recoverable, from the binary tree analysis, as we show by the example in Figure 4. From this it will follow that nothing is lost by restricting our study to binary trees.

The tree depicted in (i) has a three-way split at node 2. The next three are all the possible binary trees that can result from 'breaking the tie'. It is obvious that either all three binary trees lead to a common value of m-in which case, any of them serves as well as Tree (i); or else some of the Trees (ii)-(iv) lead to different values of m-in which case, it is misleading to treat the three possibilities with the single form (i). It follows that the three binary trees represent a finer classification of Tree (i), and so we gain only by restricting our- selves to binary trees. The value of m can be seen as equal in Trees (ii)-(iv) for A B C all u or all c; but if A is u, and B and C are both c, then m = 1 for Tree (ii), and m = 2 for Trees (iii)-(iv).

3.21. Counting the binary trees with six branches is not tedious. Such a tree must have five nodes. The five nodes can be on five levels (yielding trees of Type I in Figure 5, p. 552), or on four levels (yielding trees of Types II, III, and IV), or on only three levels (yielding trees of Types V and VI).

Thus we have six types of trees, viewed as geometrical patterns. On each we have six terminal nodes to be labeled, in some order from left to right, with the letters A B C D E F. If we regard the order from left to right as representing increasing recency, we would have 6! = 720 time-sequences with each of our 6 tree types. But our method of counting 'necessary changes' leaves some of the 720 patterns indistinguishable in any tree type. Consider, for example, Type I in Fig. 5. If E and F are interchanged, the m-score for the tree will be un- affected-because, whatever the labels u, c, o are at those two terminal nodes, the resulting contribution to m (and the labeling of the node just above them) will be the same whether E is on the left or the right. For Type I trees, then, we have 360 pairs of distinguishable trees, rather than 720. Within each pair, the relative recency of the two right-hand elements is indeterminate. The number of distinguishable (sets of) trees is not the same for all the tree types. Different types have different numbers of left-right reflections, representing different indeterminacies with regard to relative recency:9

Type I: E and F may be interchanged without affecting the linguistic tree. Thus, divide 720 by 2 = [360]

9 We are indebted to William S-Y. Wang for drawing our attention to an NSF report by Meyers & Wang (1963:94-5), which gives an algebraic formula to count binary-labeled trees for n-terminal nodes as Cn = (2n - 3)Cn-,_1C1 = 1.

550



UNCHANGED COGNATES55

(i)

(ii)

2_

A

B

C

FIGURE 4. Ternary vs. binary trees.

551




A B C D E F

A B C DE F

A B C D E F

A B CD E F

(A[B C(D[EF])]) 360

{(AB) (C[D{EFJ])} 180

(A C(BC)(D{EF}l)] 180

[A{B [(CD)(EF)1}] 90

{(AB)[(CD) (EF)] 45

{[A(BC)] [D(EF)]} 90

945

FIGURE 5. Six types of trees and their derivatives for six languages.

Type II: A and B may be interchanged, and so may E and F. Divide 720 by 2 x 2 = [180]

Type III: B,C and E,F are both interchangeable. Divide 720 by 2 x 2 = [180]

Type IV: C,D and E,F are interchangeable, and then these two pairs may also be interchanged. Divide 720 by 2 x 2 x 2 = [90]

Type V: Languages in each of three pairs may be interchanged, and then the pair C,D may be interchanged with the pair E,F. Divide 720 by 2 x 2 x 2 x 2 = [45]

Type VI: B,C and E,F may be interchanged, as well as the main branches forking from the top node. Divide 720 by 2 x 2 x 2 = [90]

Total: 945

3.3. EXAMINING AND SCORING THE 945 x 63 TREE-WORD PAIRS. It is possible to write down the geometric pattern of a tree, with the letters attached to its branch terminals, as an algebraic expression involving those letters and suitably

I

II

III

V

I

VI

552



UNCHANGED COGNATES

placed parentheses. The correspondence is shown in Fig. 5. The algebraic representation assists in the handling of the problem by computer.

The scoring procedure is as follows: Begin at a lowest node (an innermost parenthesis); here we find two elements. If they are both u's, record the reconstruction at the node as a u, and treat it as a single branch for the next step; but if they are both c's, reconstruct a c at the node, and treat it as a single branch for the next step; and if one terminal has a u and one has a c, record the reconstruction at the node as a u, treat it as a single branch at the next step, AND record a contribution of 1 to the value of m. ' Now repeat this process for any other node(s) which is/are as low as the one just handled. Then go to the next level. Whenever two branches dominated by a node carry a u and a c, increase m by one, and treat that node at the next level as a u branch; whenever two branches dominated by a node are alike (carrying two c's or two u's), record a reconstruction at the node marked u or c, as the case may be. The process takes five steps for a 6-terminal binary tree. If there are at least two c's, and at least one carries u, then the value of m will be I for some trees, and more than 1 for some others.

The reader can confirm that this computing method scores any tree the same as the method of counting proposed in ?2.1. Appendix II gives a detailed ex- position of the computer algorithm we have employed.

3.4. THE DATA. Table 2 exhibits the information concerning the 63 words, showing the occurrence of changed and unchanged forms in the six languages. When one of our 63 cognate groups is applied to a tree, the minimum possible score is 1. Thus any tree's total score from 63 etymologies must be at least 63; that score for a tree would mean that, in every one of the 63 cognate groups, a single instance of change sufficed to explain (in accordance with our postulates) the occurrences of u's and c's which are observed in the data.

In fact, the trees' final scores ranged from a minimum of 71 to a maximum of 182. It is better to think in terms of the excess above 63; the range is then from 8 to 119. Table 3 (overleaf) shows the distribution of scores, both as m and as excess.

Table 3 has interesting features. Of the 945 trees, exactly 45 gave excesses of 24 or less, and the remaining 900 gave excesses ranging from 27 to 119-

'o Since all trees are binary, only two branches are considered at a time. The following chart summarizes how the symbols are reconstructed at a node, from the symbols occurring at the terminals of branches, on the basis of the postulates proposed in this paper:

LEFT BRANCH RIGHT BRANCH RECONSTRUCTED

TERMINAL TERMINAL SYMBOL AT NODE

(1) u u u (2) c c c

(3) [u/c] [ut] u I U/c (4) [u/c] [u/c] c

(5)[] u] u u c

553




CUMULATIVE

m EXCESS FREQUENCY FREQUENCY

71 8 1 1 72 9 0 0 73 10 3 4 74 11 1 5 75 12 5 10 76 13 1 11 77 14 3 14 78 15 1 15 79 16 5 20 80 17 1 21 81 18 8 29 82 19 2 31 83 20 7 38 84 21 2 40 85 22 3 43 86 23 1 44 87 24 1 45 90-111 27-48 101 146

112-135 49-72 187 333 136-159 73-96 340 673 160-182 97-119 272 945

TABLE 3. Distribution of m and excess in 945 distinguishable trees based on 63 words.

from about three to fifteen times as great as the minimum observed excess. The gap between 24 and 27 encouraged the view that only the 45 trees with excess of 24 or less deserve serious consideration. In fact, we have restricted our consideration to the 11 lowest-scoring trees, those with excess of 13 or less; our reason is that the three trees with excess of 14 include one which disagrees with every lower-scoring tree by reversing the order of Gondi and Konda. We take this to indicate that a tree with an excess as large as 14 may well be unreasonable, and can be dropped from consideration. Thus Figures 6, 7, and 8 show the eleven leading trees.

Tree 1 has the smallest excess of all; it is the tree proposed by Krishnamurti (see ?1.1 above), and here receives strong corroboration, for it is ASSESSED HERE BY USE OF INFORMATION NOT USED AT ALL IN KRISHNAMURTI'S EARLIER

ANALYSIS. That analysis did not rest upon the occurrence of unchanged forms, nor upon the relation of those occurrences to the occurrences of changed forms. Thus this independent corroboration lends much weight to the earlier conclusion.

It is interesting to study these eleven trees with smallest excess scores. They comprise the most plausible of the 945 possibilities, and all rather resemble one another. The leader, Tree 1, and two others, Trees 6 and 11, are of Type IV. These are all the only Type IV trees which begin with Gondi followed by Konda; of the three, the first has the sharply smallest excess, viz. 8 rather than 12 or 13. Type I Trees 2-5 and 7-10 can be seen upon examination to resemble Tree 1 rather closely. Consider Tree 3; it agrees with Tree 1 in that it first breaks off Gondi, then Konda, and maintains Kui and Kuvi as a closely related

554



UNCHANGED COGNATES

m Excess Type No. of alternate

71 8 IV

73 10 1

73 10

73 10 1

74 1I I

equivalent forms

8

2

2

2

2

G K Kv Ku P M

FIGURE 6. Trees 1-5, with lowest values of m.

pair. It 'specializes' Tree 1 by first setting up the Pengo-Manda pair as an offshoot less recent than the Kui-Kuvi pair, and by then choosing Pengo as less recent than Manda. Each Type I tree shown is a 'specialization' of Tree 1, Tree 6, or Tree 11.

However, the tree with the smallest excess score-i.e. the one which de- mands the fewest independent changes among the 63 words-is Tree 1.

Tree

0

G

0

G

G M

555




m Excess Type No. of alternate equivalent forms Tree

?

75 12 IV 8 P

Ku G K P Ku Kv M

G K M P Ku Kv

G K Ku Kv P M

G K K Ku

G K Kv M P Ku

75 12 1

75 12 1

75 12 1

75 12 1

2

2

2

2

FIGURE 7. Trees 6-10, with low values of m.

4. A FURTHER ANALYSIS WITH TWO SOUND CHANGES. The foregoing treatment has regarded a cognate as either 'changed' or 'unchanged'. However, a further treatment considering more than one change under 'changed' is possible.

4.1. Define C1' as the change VL > LV (V = vowel; L = apical consonant), and CI" as the same change for a word beginning with a consonant, i.e. CVL > CLV. Now a word having reached the CLV stage by Cl" can undergo a

556



UNCHANGED COGNATES

No. of alternate m Excess Type equivalent forms

76 13 IV 8

G K P Ku Kv M

FIGURE 8. Tree 11, with low value of m.

second change (C2), the dropping of the initial consonant. We denote this compound change by cc, to represent the two-stage change of CVL > CLV > LV. (Appendix I gives fuller discussion of the multiple changes and their relations.) As before, we can determine whether or not a word is unchanged, and whether its changed form results from C1' or Cl" (mutually exclusive possibilities, since a reconstructed word begins with either a vowel or a consonant) or from cc (i.e. Cl" + C2). The data for all 63 words appear in Table 4 (overleaf); these are the same items as those in Table 2, but a fuller description of the changes is now shown.

When a word occurs in more than one form in the same language, we treat it as if only the more changed form has occurred. Thus we score ulc as c, c/cc as cc, etc. The reasons are as before: we score candidate trees in terms of the minimum number of independent changes necessary to explain the presence of various forms of the word, as specified by the tree. The example in Figure 9 (p. 560) illustrates the idea.

Let the given tree show u, c, or cc in accordance with which form occurs. Non-occurrence of the word is marked x. First we mark each non-terminal node as u if any terminal node descendant from it is marked u. Then we mark any remaining non-terminal node as c if any descendant terminal node is marked c. Finally, we mark any still remaining non-terminal node as cc. Then, through- out the tree, we replace each cc by the number 2, each c by the number 1, and each u by zero, and we enter no number for an x. Finally, we add up all the arithmetic differences between adjacent nodes in the tree. Fig. 9 has two instances of 0-1 and a single instance of 1-2. The sum of these is 1 + 1 + 1 = 3. That is the score for the tree. It represents the minimum number of independent linguistic changes demanded by this tree for the given incidence of changed and unchanged forms. Observe that 2 is the minimum possible score for any word-tree combination which contains both a u and a cc. The value of m - 2 is thus the 'excess'. The distribution of m and excess, considering the fuller information on changes, appears in Table 5.

557




ENTRY GONDI KONDA Kui Kuvi PENGO MANDA 197 c u o o c c 240 u u c c u c 508 u o c c o c 592 u c c c c o 593 u c c c c o 694 u u c c o o 775 o c u u c o 834a u u c c c c

S265 u u c o c o S290 o u c c c c S299 o u o c c o S407 u cc o o cc o S539 o u c o c c S642 u c o cc cc o S772 u o c c o o S787 u u u u c c S802 u o c o c o S877 u o cc cc o o S878 u o o o cc cc

929 u u c c o c 943 u c c cc o o

1090 u o c cc cc cc 1136 u o c o c o 1142 o u c c c c 1160 u u c c o u 1311 u o c c o o 1382 u o c c o o 1485 u o c c o o 1511 u u c c c c 1538 u u c c o o 1702 u o c c u o 1782 u u c c o o

TABLE 4. Entries from DED(S) with u-o-cl/cc pattern for six languages.

4.2. The minimum excess has now increased to 13 (from 8), and the maximum from 119 to 144. It is natural that these increases appear; they are the result of a more complex body of data conditions existing to be satisfied. More notable is the rather sharp break between the two best-scoring trees and the remainder.

Figures 10 and 11 (pp. 561-2) show the seven leading trees. Note once again that, after a certain marginal excess is surpassed, some trees invert Gondi and Konda; for these data, one of the three trees with excess of 21 exhibits this inversion. So we restrict our attention to the seven trees with excess of 20 or less. Here we find that the leading tree of the single-change analysis is in second place, one point behind Tree 9 of Fig. 7, which is now the leading tree in the two-change analysis. How is this inconsistency to be interpreted?

Our view is that 'inconsistency' is an inappropriate term. Note that Tree 1

558



UNCHANGED COGNATES

ENTRY GONDI KONDA KuI Kuvi PENGO MANDA

1784 u u c c c c 1787 u u c u c c 1949 u c c cc cc cc 1986 u cc o cc cc cc 2102 u u c cc cc cc 2529 u u c c o o 2546 u u c c c c 2655 u u c cc o c 3046 u u cc cc cc cc 3255 u u c c c c 3262 u u u c c o 3286 u u c o c c 3296 u u c c c c 3446 u o c c c c 3537 u u c c c o 3613 u u c c o o 3619 u u c c c c 3856 u c c c o o 3865 u u c c o o 3897 cc u c cc cc cc 3899 u o c c o o 3901 u u c c u c 3988 u u c c o o 4071 u u c c c o 4096 u o c c o o 4169 u u c c o o 4327 o u o u cc cc 4347 u o c u cc cc 4438 u u c c u u 4459 u o c cc cc o 4524 u o cc cc cc cc

TABLE 4. (Continued).

CUMULATIVE m EXCESS FREQUENCY FREQUENCY

92 13 1 1 93 14 1 2 94 15 0 2 95 16 0 2 96 17 2 4 97 18 0 4 98 19 2 6 99 20 1 7

100 21 3 10 101-140 22-61 122 132 141-180 62-101 339 471 181-223 102-144 474 945

TABLE 5. Distribution of m and excess in 945 distinguishable trees, using 63 words, of which 16 have compound changes.

559




u C U X C cC The given tree

U

u C U X C CC The tree with marked non-terminal nodes

0

O0 1 2 The tree marked with numbers at nodes

FIGURE 9.

of Fig. 6, which was at least two points better than all other 944 trees (judged on the single-change study) is now better-again by at least two points-than all but one of the other 944 trees, when judged by the two-change standard; and that other tree is better by a single point. Tree 1' (or Tree 2') is indeed a plausible candidate for the best explanation.

But Tree 1' (or Tree 9) may be a superior candidate. Note that it is a 'specialization' of Tree 1 (Fig. 6); it preserves Gondi first, Konda second, and Pengo-Manda as a late pair; but then it chooses Kui and Kuvi to be less recent than the Pengo-Manda pair, and chooses Kui to be the least recent after Gondi and Konda. Is this 'specialized' version of Tree 1 actually the best interpretation of the data? Perhaps. It is not clear that the question can be resolved confi- dently.

560



UNCHANGED COGNATES

To facilitate a closer view of the question, we examined the two trees by listing the entries for which they gave different m values. They gave identical values to 52 of the 63 words; in five of the remaining cases, Tree 2' had a smaller value of m, and in six cases Tree 1' had the smaller value. Table 6 (p. 562) lists those words. Note that, for eight of the words, one tree or the other had zero excess; but with three words, neither tree fit perfectly. Does Table 6 illuminate a choice between Trees 1' and 2'?

4.3. We can search for an answer within the framework of sound change. First, consider the u-o-clcc pattern in relation to the differential scores of Tree 1' and Tree 2'. Table 7a (p. 563) has five items which give lower scores for Tree 2', and Table 7b has six items which give higher scores for Tree 2'.

m Excess Type No. of alternate equivalent forms

Tree

0

G K Ku Kv P M 92 13 1 2

0

M

G K Ku P Kv M

93 14 IV

96 17 1

96 17 1 G K Kv Ku P M

FIGURE 10. Trees with lowest excess, counting two changes; best 4.

8

2

2

G

?

?

561




In each of the five items in Table 7a, both Kui and Kuvi show c, while either Pengo or Manda shows u. Naturally this pattern implies a shared change by Kui-Kuvi as against Pengo-Manda, which then become the branches of the final remaining node. These scores support the correct tree (Tree 2' of Fig. 10, or Tree I of Fig. 6). In Table 7b, note that Kuvi-Pengo-Manda share consonant simplification (C2), to the exclusion of Kui, in as many as five out of six entries. Consequently, Kuvi breaks away from its closest sister Kui-thereby providing a lower score for Tree 1' than Tree 2' for these items.

Tree No. of alternate m Excess Type equivalent forms ?

/

98 19 1 2

G K Ku M Kv P

?

/ / X ..98 19 IV 8

G K Ku M P Kv

99 20 I 2

G K P Ku M Kv

FIGURE II11. Trees with lowest excess, counting two changes; 5th, 6th, and 7th best.

SCORE MINIMUM SCORE EXCESS ENTRY 1' 2' FOR ENTRY 1' 2'

240 3 2 1 2 1 1090 2 3 2 0 1 1160 2 1 1 1 0 1702 2 1 1 1 0 1949 2 3 2 0 1 2102 2 3 2 0 1 3262 1 2 1 0 1 3897 4 5 2 2 3 3901 3 2 1 2 1 4438 2 1 1 1 0 4459 2 3 2 0 1

TABLE 6. Entries scored differently by Trees 1' and 2', using two-change analysis.

562



UNCHANGED COGNATES

SCORES

ENTRY G K Ku Kv P M TREE 1 TREE 2'

240 u u C C u c 3 2 1160 u u C C o u 2 1 1702 u u C C u o 2 1 3901 u u C c u C 3 2 4438 u u c c u u 2 1

TABLE 7a. u-o-c pattern with lower scores for Tree 2'.

SCORES ENTRY G K Ku Kv P M TREE I TREE 2' 1090 u o C CC CC CC 2 3 1949 u C C CC CC CC 2 3 2102 u u C CC CC CC 2 3 3262 u u u c C o 1 2 3897 cc u c cc cc cc 4 5 4459 u o c cc cc o 2 3

TABLE 7b. u-o-c/cc pattern with higher scores for Tree 2'.

A plausible explanation for this discrepancy lies in the very nature of C2 (CL > L) as opposed to C1" ((C)VL > CLV) (see Krishnamurti 1978, fn. 9). The simplification rule (C2), which was restricted in its structural conditions in the beginning, became generalized in Kuvi, Pengo, and Manda, thereby covering a large number of lexical items. It is possible that this generalization independently affected Kuvi, on the one hand (after its separation from Kui), and Pengo-Manda, on the other. C2 is not the kind of rule which necessarily requires a common stage of development for its implementation in a group of languages. However, the apical displacement rules (VL > LV, CVL > CLV; see Appendix I) which account for the u-o-c pattern in Table 2 are atypical; we cannot expect them to be shared by two or more contiguous languages, accidentally and independently, in their innovative cognates. Computer scoring of m would naturally set up a single c, whenever Kuvi-Pengo-Manda show it by derivation from successive nodes labeled ((Kuvi) ((Pengo) (Manda))), leaving out Kui as an off-shoot of the higher node. Consequently, Tree 1' scores less than Tree 2' for the items in Table 7b.

Another limitation concerns the gaps in the lexical data of Pengo and Manda. Pengo has no cognate for 21 entries, and Manda for 31. They share this gap in as many as 16 entries. We do not know whether this represents shared and unshared loss of cognates in these two languages, or incomplete data collection-more likely, the latter. At least for Manda, the data are in- adequate, based on a few days' field study by Burrow and Bhattacharya. If fuller data were available for these two languages, Trees 1' and 2' would not appear as such close contenders. However, we hope that the method of subgrouping proposed in this paper is not invalidated by the inadequacy of data.

5. CONCLUSION. The postulates which we propose, as well as our assumption underlying the procedure of scoring m for all possible trees, are based on

563




the standard theory of comparative method in historical linguistics. The contribution of this paper is in three areas:

(1) We have shown that a single atypical sound change shared by three or more languages is sufficient to give us subrelations among them-provided that unequivocal evidence exists that the sound change has lexically diffused, and has still left part of the eligible lexicon uncovered. Etymologies in which all cognates have or have not totally undergone change are not useful data for this study, i.e. all c's or all u's. Cognate groups in which at least two languages show change (c), and at least one language lacks the change (u), are essential for such a study. Another constraint is that, if the change is not atypical-or if it is one that can commonly occur in contiguous languages independently- the results of the method may not be satisfactory. The method we have proposed here provides independent corroboration of the traditional method of establishing subgroups on the basis of shared innovations.

(2) Within the framework of the theory of lexical diffusion, a 'shared innovation' may be defined as the sharing of innovative cognates by genetically related languages, resulting from an identical sound change. The number of such shared cognates-with-change can then be used as an index of the degree of closeness between any two or more languages. Hsieh 1973 and Krishnamurti 1978 have successfully used this measure in linguistic subgrouping."

(3) Within the model proposed in this paper, 'unchanged cognates' play as important a role as changed (innovative) cognates in scoring the binary trees in terms of the value m, i.e., the minimum number of independent instances of change posited by a given word group for a given tree. We believe that this proposal, making unchanged cognates a crucial criterion in linguistic subgrouping, is made for the first time here.

Our method can be applied to any family or subfamily of languages which fulfill the requirements of data, both to test the validity of the method and- if it works-to get a more systematic configuration of the family tree for such languages.

1 Hsieh proposes dialect subgrouping 'based on a corollary of the concept of lexical gradualness of sound change: that the more phonological forms two or more dialects share with respect to a commonly initiated but independently executed sound change, the longer they must have developed together, and hence the more closely related they are' (64, 88). Twenty dialects of the Jiang-su Province share an innovation, viz. the split of Middle Chinese Tone 1 into Tones la and lb in complementary environments. Of the 533 items which qualify for the sound change, items which occur identically in all the dialects have been eliminated, leaving 43 items 'which show different degrees of diffusion overlapping' (see fn. 3 above) in the 20 dialects studied. Each of the dialects is then compared with the remaining 19, to see how many innovative cognates it shares with each one.

Hsieh operates with the notions of 'primary' and 'secondary' groups of dialects, setting up 'a certain number of shared items as a criterion of closeness' (81). The primary group is enlarged when the criterion of shared items is reduced, and has fewer members when the criterion of shared items is increased. The 'conclusive primary groups', set up with varying membership when the criterion ranged from 29 to 43 shared innovative cognates, broadly coincide with the traditional Mandarin and Wu groups of the twenty Jiang-su dialects. Hsieh's method is certainly pioneering and insightful, but it does not provide criteria to construct an unambiguous family tree diagram for all twenty dialects.

564



UNCHANGED COGNATES

APPENDIX I

There are 23 Dravidian languages spoken in South Asia; the South-Central subgroup consists of seven languages: Telugu, Gondi, Konda, Kui, Kuvi, Pengo, and Manda. These languages are geographically contiguous, in the states of Andhra Pradesh, Madhya Pradesh, and Orissa.

Consonant clusters, as well as apical consonants (here defined as including alveolars and retro- flexes, but not dentals), did not occur word-initially in Proto-Dravidian (PDr.) This situation is reflected in the native element of all Dr. languages EXCEPT those of the South-Central subgroup. The languages of this subgroup have many cognates with word-initial apicals, as well as consonant clusters having an apical as the second member. The phonological processes which underlie these changes can be represented by the generalized rules given below. We use the following symbols and abbreviations:

PSCDr. = Proto-South-Central Dravidian. CI = [- syllabic]: all consonants admissible in word-initial position in PSCDr., viz. /p t c k

b dj g m n w/. V1 = [V, -long]: any PSCDr. short vowel which occurs in the root syllable, viz. /i e a o u/. V1 = Vi which is [+long]. L = [C, +apical, -nasal]: alveolar and retroflex non-nasal consonants, which occur only

non-initially in PDr., viz. / t r t z /. In PSCDr., It/ = [r]/[d], i.e. the alveolar stop is realized either as a trill or as a voiced stop; so also the retroflex stop Itl is [r]/[d] (= voiced flap or stop).

V2 = [V, - long, { + high, + low}]: non-mid short vowels /i u a/, which are the only ones that occur in this position.

- = etymological boundary that separates the root from the formative suffix. X = any consonant, obstruent or sonorant; or a nasal plus stop combination.

The phonological rules are as follows: Rule ': (a) VIL-V2X > LVi-X

Conditions: V2 = V1; or V2 is [+low] when Vi is [-high, -low]. (b) V1L-V2X > LVI-X(X)

Conditions: V2 is [+high], and V1 = V2; X is optionally geminated when it stands for a single consonant.

Note that (a) and (b) are complementary developments, and are therefore treated as a single rule. Rule 1" is exactly the same as Rule 1', except that a Cl precedes VI. Rule I' produces initial

apicals, while Rule 1" produces word-initial consonant clusters having apicals as second members. Rules 1' and 1" can be collapsed as follows:

Rule 1: (a) (Ci)VIL-V2X > (Ci)LVi-X (b) (Ci)VIL-V2X > (C,)LVi-X(X)

Rules la-b occur in all seven languages of this subgroup, but not with the same degree of generality. Rule la has been called metathesis with vowel contraction (of V1 and V2 into V1), while Rule lb has been called simple metathesis following loss of V2 (Krishnamurti 1961, ??1.121-1.159). Evi- dence also exists to show that Rule 1' operated first, and was subsequently generalized by involving syllables with initial consonants. Supporting evidence for this observation comes from the following facts: All languages have Rule 1' as formulated above. But Rule I" does not occur in that form in all the languages. Thus, for Telugu and Konda in Rule I", L = /t r z/ (only phonetic resonants and non-laterals), and C, excludes the anterior nasal /n/. In Telugu, /t/ [r] and /z/ merge with the reflex of /rl in CL clusters. In Gondi, Rule 1' occurs, but there are : o instances of initial consonant clusters derived by Rule I". However, in certain Gondi dialects, a few lexical items bear testimony to the operation of both Rule 1" and Rule 2 (see below). It appears that Rule 1' is the oldest, and Rule 1" a later extension, leading to the collapsing as Rule 1, which operates in its most generalized form in Kui-Kuvi-Pengo-Manda.

Rule 1" is diachronically followed by Rule 2, with which it has a feeding relationship: Rule 2: CLV-X > (a) CV-X (Telugu)

(b) LV-X (all other languages) The consonant clusters formed by Rule I" are hereby simplified. In Telugu, the apical is lost; elsewhere, the simplification is by loss of the first member.

565




Telugu has not been included in this study, because it is unequally placed in relation to the other six members of the subgroup, as a result of its long literary history as well as its wealth of vocabulary. Rule 2 is a shared innovation in the other six languages, pointing to a common stage of development of these languages after Telugu had split off. Rules 1' and 1" produce apical displacement, while Rule 2 simplifies the consonant clusters formed by Rule I".

Independent evidence is available to establish the hierarchical relations of these six languages as shown in Fig. 1, above. For instance, PSCDr. /t tt nd/ develop differently in these languages, pointing to a stage of common development for Kui-Kuvi-Pengo-Manda (as shown in Table 8).

PSCDR. GONDI KONDA Kui KuvI PENGO MANDA

*t r r j j j j *tt tt R s c c c *nd/*nr nd nr nj nj nj nj *1 11/ II r/l r r *z d/r r r r r r

TABLE 8. Shared phonological innovations among the South-Central Dravidian languages.

Table 8 shows that all but Gondi share the innovation /z/ > /r/; all but Gondi and Konda share the innovations /t/ > /j/, /tt/ > /c. In Pengo and Manda, PSCDr. /z I/ merge into /r/, whereas they are generally maintained in the other languages, at least dialectally. (Dialect names are abbreviated as in DED(S).) Kui and Kuvi shift mid-long vowels to low, which is a shared innovation (see Krishnamurti 1980). The conclusion we have reached thus corroborates the traditional subclas- sification of these languages, based on shared innovations in phonology and morphology.

The rules formulated above are illustrated by a few typical cognate groups below (u = unchanged; cl' = changed through Rule I'; cl" = changed through Rule I"; c2 = changed through Rule 2);

592. PSCDr. *az/*uz-V- 'to plough' (*VL > LV in Konda, Kui, Kuvi, Pengo). [u] Gondi (A,W) ur-, (SR) ur-, (Pat.) ud-, (M) urd-. [c I'] Konda rc (ru-t-). [u/cl'] Kui ir- 'to dig with the snout, root up'; ru (rut-) 'to plough; n. ploughing'. [cl'] Kuvi (F) ruiy-, (Su.) ru- id.; (S) lu- 'to nuzzle (of a pig)'. [cl'] Pengo ru (rut-) 'to till soil'.

[o] Manda. 834a. PSCDr. *or/*or-V- 'one', *orand 'one man' (*VL > LV in Kui, Kuvi, Pengo, Manda).

[u] Gondi (Y) oror 'one man', orone 'alone', (Mand) ore 'one man', (M) orpan 'at one place'.

[ul Konda oren(r)- 'one man'; or- 'one'. [cl'] Kui ro 'one', roanju 'one man', ronde, (K) ronde 'one woman or thing'. [cl'] Kuvi (F) ro 'one', ro'osi, (Su.) ro'esi 'one man', rondi 'one woman or thing'. [cl'] Pengo ro 'one', ronje 'one man', ronjel 'one woman'. [cl'] Manda ru, rundi 'one', rukan 'one man'.

S290. PSCDr. *ker-a- 'scoop up with hand or ladle', 'to gather and scrape up' (*CVL > CLV in Kui, Kuvi, Pengo, Manda).

[o] Gondi. [u] Konda ker-. [cl"] Kui grap- (grat-), (P) grep- (grit-). [cl"] Kuvi (F) grec- (gret-). [c1"] Pengo gre- (gret-). [cl"] Mandla grepa.

S642. PSCDr. paras 'gourd' (*CVL > CLV in Konda, > LV in Kuvi, Pengo). [u] Gondi paras, (Ma.) paras, (Ph.) parras, porrds, (Tr., W) parais. [u/c "] Konda (BB) paras, prasu.

566



UNCHANGED COGNATES

[o] Kui. [c ", c2] Kuvi (F, Su.) jdcu (pl. jaska). [cl", c2] Pengojacka 'gourd spoon'. [o] Manda.

Note that Kuvi and Pengoj- is from *r (< *pr-).

APPENDIX II

The tree search algorithm was written in the language SAIL running on a DEC PDP-10 computer owned by Stanford University's Institute for Mathematical Studies in the Social Sciences. The algorithm used SAIL records to represent the alternative linguistic subgroupings. Each node of one of the six possible tree shapes was implemented as a record composed of three words of computer storage. Two of the words represented the left and right sons of the node. If the node was a leaf of the tree, then both words were set to the null record. The third word was a pointer to the language currently associated with the node. If the node was not a leaf of the tree, then the value of the pointer was set to zero. Otherwise, it was set to an integer between 1 and 6 which was used to reference one of two arrays: of string names for each language, or of 'change types' for the current entry. Actually, the values 1-6 pointed to an index array which contained a per- mutation of the digits 1-6. The change types took the values u, c, or o: unchanged from the proto- form, changed from the proto-form, or 'no cognate in this language available for this entry'.

The six possible tree shapes were simply enumerated as case statements, rather than being generated algorithmically. For each shape, all possible permutations of the digits 1-6 were assigned to the leaf nodes successively (through permutations of the index array), and each resultant labeled tree was evaluated for each of the 63 words in the data. The evaluation process was easily ac- complished with a recursive algorithm that assigned a score and a change type to each subtree in the following way. If the left and right sons of a node had been assigned a change type u, then the node was assigned the change type u, and was given a score equal to the sum of the scores of the two offsprings. If one son was u and the other c, then the parent node was assigned the change type u, and was given a score one greater than the sum of the two sons' scores. The parent node must have been unchanged from the proto-form in order to transmit to one of the sons an unchanged form (we are disallowing the possibility that a changed word will revert back to its old proto-form). Since the parent node has given birth to a changed child, this change must be counted in the total number of changes for the tree; hence the increase by one over the offsprings' scores. In Table 9 we present the other possible parent-offspring relations.

CHANGE TYPE LEFT SON RIGHT SON NODE NODE SCORE

u o u a u u u a+b u c u a+b-+ 1 c o c a c c c a+b c u u a+b+ o o o 0 o u u b o c c b

TABLE 9. Here a and b represent the scores for the left and right sons respectively.

The recursion starts at the top of the tree (the root), and defers scoring until the leaf nodes have been evaluated. The score is accumulated, and finally assigned as the value of the tree when the algorithm exists from the root node.

The output of the program states the frequency of the number of trees for a given score, identifies the trees with scores between 60 and 75, and gives the entry-by-entry scoring for the best tree.

No effort was made to apply only those permutations to the labels of the tree which would give unique labelings for a given tree shape. The resultant over-counting was corrected by decreasing

567




the frequency of the number of trees with a given score by a symmetry factor for each tree. The symmetry factors for Trees 1-6 are 2, 4, 4, 8, 16, and 8, respectively.

REFERENCES

BLOOMFIELD, LEONARD. 1933. Language. New York: Holt. BURROW, THOMAS, and M. B. EMENEAU. 1961. A Dravidian etymological dictionary.

Oxford: Clarendon Press. 1968. A Dravidian etymological dictionary: Supplement. Oxford: Clarendon

Press. CHEN, MATHEW, and WILLIAM S-Y. WANG. 1975. Sound change: Actuation and imple-

mentation. Lg. 51.255-81. D'ANDRADE, Roy G. 1978. U-statistic hierarchical clustering. Psychometrika 43.59-67. DYEN, ISIDORE. 1953. Review of Malgache et maanjan: Une comparaison linguistique,

by Otto Chr. Dahl. Lg. 29.577-90. GUDSCHINSKY, SARAH C. 1956. The ABC's of lexicostatistics (glottochronology). Word

12.175-210. [Reprinted in Hymes 1964:612-23.] HOENIGSWALD, HENRY M. 1960. Language change and linguistic reconstruction. Chi-

cago: University of Chicago Press. -- . 1966. Criteria for the subgrouping of languages. Ancient Indo-European dialects,

ed. by Henrik Birnbaum & Jaan Puhvel, 1-12. Berkeley & Los Angeles: University of California Press.

HSIEH, HsIN-I. 1973. A new method of dialect subgrouping. Journal of Chinese Lin- guistics 1.64-92. [Reprinted in Wang 1977:158-96.]

HYMES, DELL (ed.) 1964. Language in culture and society. New York: Harper & Row. JANSON, TORE. 1977. Reversed lexical diffusion and lexical split: Loss of-d in Stockholm.

In Wang, 252-65. KRISHNAMURTI, BH. 1961. Telugu verbal bases: A comparative and descriptive study.

(UCPL 24.) Berkeley & Los Angeles: University of California Press. --. 1978. Areal and lexical diffusion of sound change: Evidence from Dravidian. Lg.

54.1-20. - . 1980. A vowel-lowering rule in Kui-Kuvi. Proceedings of the 6th Annual Meeting,

Berkeley Linguistic Society, 495-506. LABOV, WILLIAM. 1972. Sociolinguistic patterns. Philadelphia: University of Pennsyl-

vania Press. MEYERS, L. F., and WILLIAM S-Y. WANG. 1963. Tree representations in linguistics.

(POLA reports, Series I, 3.55-139.) Columbus: Ohio State University. WANG, WILLIAM S-Y. (ed.) 1977. The lexicon in phonological change. The Hague: Mou-

ton. WEINREICH, URIEL; WILLIAM LABOV; and MARVIN HERZOG. 1968. Empirical foundations

for a theory of language change. Directions for historical linguistics, ed. by Winfred P. Lehmann & Yakov Malkiel, 97-195. Austin: University of Texas Press.

[Received 24 August 1981; revision received 2 July 1982; accepted 30 July 1982.]

568