Computer models of language acquisition

Computns in Human Behavior, Vol. 4, pp. 133-145, 1988 Printed in the U.S.A. All rights reserved.

0747-5632/88 $3.00 + .OO Copyright 0 1988 Pergamon Press plc

Computer Models of Language Acquisition

Hiromi Morika wa

University of Kansas

Abstract-Since the method of computer simulation was introduced to the study of human behavior, a number of computer models of language acquisition have been proposed and im@mented on a computer system. Because of varied purposes of simulation and diverse theoretical backgrounds of the model builo?ers, there are notable dissimilarities among the mooUs. This paper reviews the existing computer models of language acquisition. The comparison is done in terms of the basic components of the models: (a) input to the computer system, (6) how linguistic knowledge and the process of learning are represented, and (c) what is initially built into the system and what is learned. Based on the review, the potential usefulness of the computer simulation methods in the area of child language is discussed.

A computer model of behavior, like mathematical and physical models, is a representation of an underlying theory. While explanatory statements of a theory often tend to be abstract, a computer model of the theory attempts to express the mechanisms and processes involved in the theory in a more concrete manner using computer programs (Lehman, 1977). In the study of mental activities of human beings, such as learning, decision making, problem solving, language comprehension, and production, the approach using a computer model and simulation puts particular emphasis on step-by-step processes which are assumed to take place in the human mind. The computational approach enables us to make a dynamic representation of a theory. However, similar to any other kind of approach to cognition and natural language, the computational method does have strengths and weaknesses, and requires careful judgment in planning and evaluating a simulation. In this paper, I will discuss the potential usefulness of the computer model approach to the study of language acquisition. For this purpose, I will first review the computer models of language acquisition which appeared in publications as early as 20 years ago (Kelley, 1967; Schwartz, 1967).

There are about 20 computer models of language acquisition found in the liter- ature, and it is easy to notice the overwhelming diversity among them. Some are developed for the purpose of devising an intelligent and efficient computer rather than modelling human beings (Harris, 1976; Hedrick, 1976; Siklbssy, 1971, 1972). Others model human beings, but they simulate second language learning by an

Requests for reprints should be addressed to Hiromi Morikawa, Child Language Program, Univer-

sity of Kansas, 1043 Indiana, Lawrence, KS 66044.

I would like to thank Dr. Clifton Pye, Dr. Susan Kemper, and Dr. Elizabeth Pemberton and the

anonymous reviewers for their valuable comments and suggestions on earlier versions of this paper.

133

I34 Morikawa

adult (Gasser, 1985; Klein & Kuppin, 1970). The rest attempt to model children’s first language acquisition.

For the researchers in child language study, computer simulation may be a unique research method because it is possible to directly observe and control the learning process. In observing children’s linguistic behavior in real life, we can do no more than infer what mechanisms are working inside the young children’s mind. It is worthwhile to look into how the computer models are formulated and what they can tell us about the way a language is learned by human beings. For this purpose this review, for the most part, will concentrate on the models of first language acquisition.

THEORY-BASED MODELS AND DATA-DRIVEN MODELS

All the models of first language acquisition are based on some sort of theoretical assumptions or hypotheses. However, we see a few different ways in which the model approaches the process of language acquisition. One approach is to begin with characterization of adult language structure and to proceed backwards with a hypothesis of how a child might reach the end results. The other way is to begin by characterizing children’s early language, and then postulate how it gradually approaches the level of adult language. Computer models based on linguistic theories are examples of the former approach. These models include Walsh’s (1981) Lexical Interpretive Acquirer based on lexical functional grammar (Liebhaber, 1987; Pinker, 1985), the model by Berwick and Weinberg (1984) based on transformational grammar, and the model by Block, Moulton and Robinson (1975) based on syntax crystal theory (see Moulton & Robinson, 1981). As Pinker (1985) described the result of Walsh’s simulation, it is no surprise that these strictly theory- based systems succeeded in learning the rules they were supposed to learn because the systems were carefully designed to do so. Successful simulation in this case means that language is learnable in the condition designed in the model (Hoff- Ginsberg & Shatz, 1982; Pinker, 1979), but does not prove that the theory truly holds in first language acquisition.

On the other hand, the approach which starts with the characteristics of early language considers the empirical data from children and includes such factors as typical linguistic and nonlinguistic input for children, children’s knowledge about the real world, and conceptual development, along with postulated learning rules (e.g., Hill, 1983; Selfridge, 1980, 1981, 1986). If the designed system learns language in a manner similar to children, it may be assumed that the model embodies the real mechanism of language acquisition. However, models of this type tend to be complex and hard to complete. Many such models simulate the early part of language development but do not address the subsequent stages (e.g., Hill, 1983; Selfridge, 1980).

Finally, there are some models which fall in between the above types of approach. Most of these models focus heavily on learning mechanisms rather than on theoretical characterizations of adult language or empirical data of developmental progression. Among the models in this group, some assume their learning mechanisms to be specific to language (e.g., Langley, 1982), and others assume theirs to be a more general cognitive mechanism (e.g., Anderson, 1983).

Regardless of the type of approach the models employ, all are very far from cap-

Languuge acquisition models 135

turing the whole process of children’s language acquisition. What most of the models attempt is to isolate some variables and conditions from the whole mechanism of language acquisition and simulate the hypothetical partial mechanism to see if it works in isolation. The inconsistent focus of attention to language acquisition (mainly caused by the model builders’ background and interest) led to the above variation of linguistic theory-oriented and development-oriented approaches. It should be noted here that these different kinds of approaches do not necessarily compete with each other. As I will discuss later, all the above approaches have the potential of making joint contributions to our understanding of language acquisition.

In what follows, I will examine the variation among the models. Instead of look- ing at one model at a time, I have chosen to focus on some basic components of the models that are of particular concern in the study of child language, and com- pare how each component is treated in different computer models. These components include input to the computer system, how linguistic knowledge and learning are represented, and what is initially built into the system and what is learned. For reviews of individual models, see the review sections in McMaster, Sampson, and King (1976), Pinker (1979), Hill (1983), and Langley and Carbonell (1987).

COMPARISON OF THE BASIC COMPONENTS OF COMPUTER MODELS

hput and feedback to the System

The linguistic input a child receives from the environment would most accurately be replicated by direct input of speech. However, at present, the problem of processing acoustic signals is far from solved (McMaster et al., 1976). The common and pragmatic solution for the problem is to use the conventional orthography pro- vided from the computer keyboard.

This produces another problem of how to handle segmenting sound strings into morphemes and words. The majority of the models, especially those which are not concerned with developmental progression, avoid this problem by assuming that the concepts of words and morphemes are already known to the child. This is similar to the assumption of some theorists of language acquisition (e.g., Pinker, 1984; Wexler & Culicover, 1980). In general, segmentation does not seem to pose very serious problems for children (see Peters, 1983), which may be why segmentation does not attract much attention from the theorists. However, empirical evidence suggests that children do not completely solve the segmentation problem before reaching the multiword stage (Pye, 1986). Therefore, breaking the linguistic input into meaningful units should be a part of the language learning process. The only models that explicitly include segmentation are MacWhinney’s (1986, 1987) Com- petitive Parser and McMaster and others’ (McMaster et al., 1976) Comprehen- sive Language Acquisition Program (CLAP). Unfortunately, CLAP’s segmentation process is not plausible as a real process of children’s segmentation in that it includes blanks between words and punctuations as segmentation cues which are not clearly marked in speech.

On the other hand, MacWhinney (1986, 1987) has implemented the segmentation process involving checking a new input string against the auditory properties of all the learned lexicon and partial determination of segments by syntactic con-

136 Morikawa

texts. It is still unknown to us how closely this model performs to a real child, and whether it experiences the same kind of difficulty reported in children’s data. Yet this system is the most flexible one at present and it may be applied to languages other than English.

Some of the computer models use only a set of grammatical sentences as the input. The models which are based on a theory of autonomous syntax (e.g., Ber- wick & Weinberg, 1984) 1 imit the input data to the example sentences for the computer to analyze. On the other hand, the models attempting to simulate language acquisition more realistically allow their systems to access not only to such linguistic input but also to the meaning of the input sentences (Anderson, 1983; Hill, 1983; Langley, 1982), and information about the situations and goals of communication (Hill, 1983; Langley, 1982; MacWhinney, 1986; McMaster et al., 1976; Selfridge, 1981). Selfridge’s (1980, 1986) CHILD also relies on a “stressed word” in adult sentences, marked by capitalization, to find the topic of conversation. This is based on the fact that children do not hear language in a vacuum. In order to have successful conversational exchange with others, children must be relying on these nonlinguistic and contextual supports. Also, the models that include these kinds of input are mostly driven by a comprehension or production process rather than by the explicit goal of grammar induction at which the above mentioned theory-based models aimed (Berwick & Weinberg, 1984; Block et al., 1975). This does seem more appealing because many researchers believe that children learn language through their attempt to communicate rather than to infer syntactic rules. How- ever, until it is possible to demonstrate that communication-driven models can con- struct abstract syntactic rules as the end product, there is no reason yet to give any credence to them.

Whether a model is realistic or not also relates to how feedback to the system is treated in the model. The models of language acquisition through language comprehension and/or production also incorporated, in one way or another, a sort of feedback information. The models by McMaster et al. (1976), Schwartz (1967), and Selfridge (1980, 1986) use feedback-like approval or disapproval of sentences the system generated, and repetition of the previous input sentences. A somewhat unrealistic type of supportive information is the target sentence against which the system checks its production (Anderson, 1983; Kelley, 1967; Langley, 1982; Reeker, 1976). Such feedback is highly effective for learning as long as the learner is paying proper attention. In the models by Anderson and others, the systems are designed to focus on the feedback after each trial and carefully check how their production deviates from the target sentence. This immediate feedback, however, is unrealistic in children’s language acquisition. From the observations by Brown and Hanlon (1973) and Braine (1971), it seems unlikely that the parent always corrects the child’s syntactic errors immediately or that the child always properly pays attention to what is being corrected. Children do not seem to be dependent on fine- tuned parental corrections on their syntax, nor do they seem to be concerned so much about how correct their production was as far as their communicative purposes are fulfilled. Perhaps a realistic model of language acquisition should not depend on negative evidence.

On the other extreme are the models based on linguistic theories and the models of machine learning. Their systems are mostly careful learners which avoid trial- and-error learning, hence require no feedback.

Language acquisition models 137

Representation of Meaning and Syntactic Structures

Any input fed to a learning system must have all its segments identified and must be transformed through a parsing process and stored in memory in such a way that it is effectively used for further learning and comprehension/production of sentences. In other words, a model builder must work out the details of how meaning and syntax are represented within the system in an explicit manner. Some may tailor various parts of the processes to characteristics of human beings, while others may design their systems with only efficiency in mind.

Some of the earlier models (Kelley, 1967; Klein & Kuppin, 1970) combined word classification and their main semantic knowledge structure with heuristics on word class distribution (e.g., Harris, 1964, Siklossy, 1971, 1972). In these models, the systems have certain bias toward categorizing words into predetermined semantic/conceptual classes (thing, action, etc.). The systems examine how fre- quently certain words occur and which position a word class takes in a group of sample sentences.

The positive aspect of this distributional analysis is that the system does not break down when an input sentence includes some unknown words. The system analyzes only the positions of known parts and either ignores or stores the unknown words for later analysis. However, the distributional analysis does not succeed beyond the level of two- or three-word sentence structures. As the sentence length expands, the number of input sentences required for the analysis becomes ex- tremely large. The relations between word classes and sentence positions also become more complicated because many words belong to more than one word class. This one-to-many mapping between word classes, positions, and complex structures, such as embedded clauses, cannot be treated successfully by such a distributional grammar (for a detailed discussion on this topic, see Pinker, 1979).

If a system is communication driven, it is necessary to know what the input sentence means (including the meaning of content words and sentence structure), rather than to focus on what comes after (or before) in the input sentence. The computer models developed after the above-mentioned models of distributional analysis, for the most part, employ a certain way of dealing with the relations between structures of sentence constituents (single ones or chunks) and their asso- ciated meaning.

The more recent models can be roughly divided into two classes based on how the elements of meaning and syntactic structures are processed and stored. The first group handles meaning and syntax in a unified manner. For example, ACT* (Anderson, 1981, 1983), CHIE (G asser, 1985), AMBER (Acquisition Model Based on Error Recovery) (Langley, 1982), and the model by Schwartz (1967) employ a sort of graph structure to represent both syntactic structures and conceptual structures. Because both entities are in a similar network form, meaning is converted to its surface structure directly through a graph deformation process (Anderson, 1983) or activation of network nodes (Gasser, 1985; Langley, 1982). Another example of this group is Selfridge’s CHILD (1980, 198 1, 1986) which includes syntactic knowledge (specification of the position of the fillers for each slot) in its meaning structure represented in the form of conceptual dependency structure (Schank & Abelson, 1977). In such unified representation form, both a linguistic structure and its meaning are processed together. On the whole, these models

138 Morikawa

assume that the language domain and conceptual domain are not strictly separated, which was explicitly stated by Anderson (1983) and Gasser (1985).

In contrast, other models show a rather clear separation between grammar and the conceptual/semantic knowledge base. Hill (1983), for example, prepared separate data spaces for the lexicon, grammar (template grammar using slots and fiiers), and conceptual knowledge, and these were integrated by a kind of central processing function called interpreter. In this model, the domains of language and cognition are viewed as modules which are separate but which interact with each other. The rest of the models do not explicitly suggest domain specificity, yet they do involve separate data bases (Hill, 1983) for conceptual/semantic knowledge and syntactic knowledge (McMaster et al., 1976; Reeker, 1976).

The choice between the unified or modular representation of knowledge, in some cases, may have been forced by the choice of syntactic rules (e.g., slot-filler grammar, phrase-structure grammar) and semantic structure (e.g., conceptual dependency, network, categorization). However, such contrasts in knowledge representation also relate, to some extent, to the arguments about mental structure (Anderson, 1983; Fodor, 1983). It is tempting to view-early language in a holis- tic manner in which syntax learning is intertwined with “learning how to mean” (Halliday, 1975; Winograd, 1983). It is also seen that some of the theories of children’s early grammar (e.g., Slobin, 1986) assume children’s expectations of one- to-one form-meaning-mapping. However, as language becomes more and more complex, approaching a complete adult language system a language learner needs to deal with one-to-many and many-to-one form-meaning relations which modular structures seem to handle better than unified structures. Which view of mental structure (unified or modular) is psychologically plausible, or whether one structure takes over from the other at some point in the course of development, still remains to be seen. The presently available computer models give us no definite conclusion on this matter.

There is another controversial problem concerning syntactic representation. Two models in the group of modular representation (McMaster et al., 1976; Reeker, 1976) and the machine-learning models (e.g., Harris, 1977) use context-free grammar as their representation of linguistic forms. Context-free grammar is one of the subclasses of phrase structure grammar (see Winograd, 1983, for a description). It consists of a set of rules for rewriting various linguistic units (S, NP, VP, etc.) into a sequence of subunits regardless of where the units occur in a sentence. The rules, therefore, deal with immediate constituents but not discontinuous constituents in a sentence. The rule system is highly restricted, and therefore simpler than context-sensitive grammar (see Pinker, 1979), which makes it a popular syntactic representation in the area of artificial intelligence. However, it has been shown that context-free grammar fails to correctly characterize natural language. Bres- nan, Kaplan, Peters and Zaenen (1982) and Peters and Ritchie (1973) reported the evidence from their data in Dutch. Among the computer models which assume context-free grammar, those which have actually been implemented learned only a limited amount of language (Harris, 1977; Siklossy, 1972). Therefore, there has been a move away from context-free grammar based on the conclusion that context-free grammar is not a systematic representation suited for dealing with the complexities of natural language.

However, it should also be noted that some linguists are recently moving back

Languge acquisition mooTA 139

to context-free grammar. Generalized phrase structure grammar, an extension of context-free grammar, has been developed by Gazdar (1982) and his associates (Gazdar, Klein, Pullum & Sag, 1985). The main feature of the grammar is that it eliminated the use of transformations without losing the power of a context-free grammar. Transformations have been replaced by some new sets of rules, the most noteworthy of which is a set of semantic rules that operate in parallel with syntactic rules in order to determine the grammaticality of sentences. For example, incor- rect treatment of constituents separated by an embedded clause may be blocked because semantic rules will fail.

Context-free grammar, therefore, has not been dropped entirely by linguists. Rather, it is now discussed in the context of an integrated model of syntax and semantics instead of combined with the concept of autonomous syntax (Winograd, 1983). A computer model built in such a framework has not been found yet, although one theory which shares the characteristics of heavy reliance on semantics (lexical functional grammar) has been simulated on computers. But it seems predictable that computer simulation will be attempted in the future as part of testing the psychological reality of the formalism which involves context-free grammar and joint operation of syntax and semantics in the comprehension and production of sentences.

One last system of linguistic rule representation will be discussed here. The models by MacWhinney (1986, 1987) and Rumelhart and McClelland (1986, 1987) are based on the view called connectionist theory or parallel processing theory. The connectionists attempt to devise a powerful computer system by imitating the mas- sive amount of neural connections in the human brain. They assume that our knowledge is stored in the strength of these connections, and information may be stored in the computer in the same manner (Hinton, 1985). In such a framework, grammar is not represented as an explicit rule system, but rather, buried in the patterns of connection between linguistic items. In the case of Parallel Distrib- uted Processing (PDP) model by Rumelhart and McClelland (1986, 1987) which learned the past tense of English verbs, it is the verb stems and regular/irregular past tense morphemes in English which are connected, and the increased strength of the connection between a particular verb stem and a correct morpheme mean that the past tense of the verb is learned. As for MacWhinney’s (1986, 1987) Com- petition Model, connections are to be made at various levels: auditory and semantic properties of lexical items, grammatical roles determined by verbs, etc. The model assumes that the lexical items compete as candidates for matching new input data or filling a slot of certain grammatical role. The candidate item with the highest degree of activation (that is, the strongest connection) wins the competition.

This new theory contrasts sharply with other theories of cognitive and linguistic behavior in that it accounts for rule-like behavior without positing any rules. The connectionists’ statement is that their computers and human beings can learn intricate mental processes by repeating certain operations which resembles forming analogies. Rumelhart and McClelland (1987) h s owed that their PDP model learned the past tense rule like children do. It first produced correct regular and irregular verbs, started to overgeneralize the regular past form after acquiring certain amount to verbs, and finally sorted out the correct forms of all the learned verbs. However, Pinker and Prince (1987) examined the PDP model and concluded that it does not accurately model language acquisition. They argue that children do not

140 Morikawa

check the strength of all the possible connections in order to produce a past tense form. Rather, some semantic constraints are at work so that children can avoid certain types of production errors. For this problem, MacWhinney’s (1986, 1987) Competition Model may be able to offer an answer, at least partially. MacWhinney proposes the idea of ‘dynamic connections’ which lets a learner limit the candidate items for competition by considering the various cues: semantic cues, morpholog- ical cues, word order cues and so on (see Bates and MacWhinney, 1987, for a discussion on cues). Although the Competition Model relies on the statistical properties of the input just like other connectionist models, it retains the characteristics of a rule-governed mechanism by incorporating the notion of competitions under various constraints. However, a large part of the model is still in the form of a proposal and has not been implemented on the computers. We do not know yet how the entire system can perform. Further development of this argument about connection patterns versus explicit rules is expected.

What is Innate and What is Learned

There is not much variability among the models concerning “what is learned,” because the goal of most of the models is learning syntax. Except for the model by Sembugamoorthy (1979) which relies on direct teaching of language, all the models attempt to learn a syntactic rule system chosen by their designers (and in some cases, word meanings) from sample sentences and additional information.

The picture is not very clear concerning the initially built-in functions and knowledge because some models do not include the earliest stages of language acquisition. For example, Anderson’s ACT* (1981, 1983) starts with a base set of words already known by the system. Reeker’s PST (1976) begins with what is called the “initial grammar,” and does not explain how the initial grammar came into existence. However, few models presuppose that linguistic knowledge, other than segmentation is innate. The only exception is Berwick and Weinberg (1984) and Harris (1977) who built in the concept of parts of speech in their systems. Although no model is found which has its initial setting firmly based on some theoretical assumptions or hypothesis made from children’s data, the most common innate mechanism among many existing models is a kind of cognitive propensity to recognize patterns and relations, to form classes of concepts and words, to reor- ganize stored information, and so on (e.g., Anderson, 1983; Hill, 1983; Langley, 1982; Selfridge, 1981, 1986). It seems that the majority of the computer models of first language acquisition have implicitly (and in some cases pragmatically) taken the point of view that the child brings to language acquisition the innate perceptual and conceptual abilities which are powerful enough to learn language with no initial linguistic knowledge.

The theories of language acquisition also do not provide a consistent picture. Some theories incorporate innate basic linguistic concepts (e.g., Pinker, in press), or propose a set of perceptual/cognitive functions designed specially for language acquisition (e.g., Slobin, 1986), yet others may look for a kind of almighty cognitive function which works for any domain of learning. Some of the proposals may predict similar developmental phenomena in young children although their assumptions of innate capacity differ. This may be where the computer model approach can step in as a tool for testing which assumption fits children’s data better and has a potential of going through smooth transactions to an adult grammar.


DISCUSSION

Present Status of the Computer Models of Language Acquisition

There are roughly two problems found in computer models of language acquisition as they stand. First, many of them do not model an entire course of language

acquisition, or have not been fully implemented. Mostly the early models and theory-oriented models were implemented and tested. Some of the theory-based models (e.g., Liebhaber, 1987; Walsh, 1981) were found useful in evaluating the structure of their theories. However, the other models, especially those attempting to model realistic processes, tend to be very complex and difficult to complete. Some of them are reported in the form of a proposal (e.g., McMaster et al., 197fj), and others deal with only the early developmental periods (e.g., Hill, 1983; Sel- fridge, 1980). Although MacWhinney’s (1986, 1987) model proposes an account for the entire processes of language acquisition, its implementation has not gone far beyond the lower-level process of lexicalization. It seems that we must wait for quite some time until a fully-specified, fully-implemented model of language acquisition is completed if at all possible.

Second, the models are not consistent with other models or child language studies. Although the fundamental advantage of computer models is the explicit and well-detailed model structure, the designer may be forced into making arbitrary choices when there is no theory or empirical data to support one alternative or other (e.g., whether the syntactic and semantic representations are unified or modular). Also, there are some recently proposed theories in child language, beside the Competition Model, which seem worth evaluating in the form of a computer model. For example, we have seen that the models based on the distributional analysis of word class were not successful. However, some psycholinguists are now considering it as a part of early language acquisition, stating that both distributional and semantic facts must play a role in children’s formation of word classes (Maratsos, 1982). The distributional analysis failed as a single means to learn language, but it may be tested again under a different condition.

To sum up, the review of the diverse computer models tells us that computer simulation is still in its early age in the area of child language, although it has a history of over 20 years in computer science. However, the computer model approach does have some strength that other methods of child language research do not.

Potential Usefulness of the Computer Model Approach in the Study of Language Acquisition

First of all, computers can work with wide and narrow focuses. An ambitious model designer may attempt to create a model with all the possible variables he can control. A new proposal by Langley and Carbonell (1987), for example, includes characteristics like goal-driven learning, vision, motor functioning, hear- ing, learner’s attention and expectation, and active pursuit of information (such as asking questions). This is a highly realistic model of an active child. It is also an awfully complicated model and difficult to complete, yet the processes of each domain can be made transparent instead of left in a black box.

Besides, computers, of course, can go through an entire course of language acquisition in less time than human beings. For testing a theory or hypothesis

142 Morikawa

which requires evidence from longitudinal data, a computer model can provide a relatively faster prediction before many years’ worth of children’s data are collected. Also, the element of time is a particular strength of the computer models in the sense that they can handle developmental sequences and mechanisms of quantita- tive and qualitative changes in a concrete manner. Although, in many of the existing models, the developmental changes are determined arbitrarily, models properly designed for evaluating some hypotheses can be more effective than cross-sectional data from children.

These advantages apply to situations where computer models are used parallel to analyses of children’s data. There are at present, however, situations in which actual computer simulation is the only possible means to evaluate a hypothesis. A connectionist model without any element of rules, such as the PDP model by Rumelhart and McClelland (1986, 1987) cannot be tested on the basis of children’s observable behavior alone. The parallel distributed processing is claimed (though not proven) to be a ‘biologically hardwired’ function which is different from any cognitive strategic operations. If a behavior is assumed not to be rule-governed, it is virtually impossible to make predictions. Without the description of how the simulations were done and what results were obtained, Pinker and Prince (1987) would not have been able to analyze the adequacy of the PDP model in such detail.

Another advantage of computer models is that it can intentionally focus on a particular part of language acquisition. Pinker (1979) noted that the language learning mechanisms are notoriously underdetermined by the child’s observable behavior because all the other developmental domains besides language are changing at the same time. Pinker (1984, 1985, in press) has been developing a formal model which takes a ‘learnability-theoretic approach’ to language acquisition. The model attempts both to characterize a grammar which is learnable from a body of input data, and to predict how the grammar is learned. To test the learnability condition of a particular formal model, language acquisition must be analyzed without any external confounding factors such as motivation, social skills, and so on. We cannot completely isolate the variables of our interest in children whereas it is possible on computers.

There are at least two other areas to which the flexibility of computer models can contribute. One area is language universals. Here the computer focuses on early periods of language acquisition (around speech onset and a few subsequent years). The early language may be heavily supported by some innate capacities and some universal characteristics of language. Developmental psycholinguists have turned to crosslinguistic data and are searching for universal early grammar (e.g., Slobin, 1986). As discussed before, computer simulation is an excellent tool in this area because, as McMaster et al. (1976) suggested, the same computer program can be used to acquire two or more languages each as a first language, using the same initially built-in abilities and receiving the same nonlinguistic input. By carefully controlling the structure of the initial functions, we can approach the questions of how universal or language-specific early grammar is (e.g., Bowerman, 1986; Slobin, 1986), and whether early language is represented in a child’s mind in semantic categories or both in semantic and grammatical categories (e.g., Levy, 1983; Marantz, 1982) and most importantly, how the hypothetical early grammar transforms to an adult grammar.

The second area is the relationship between language and cognition. Suppose that a comprehensive model of language acquisition is created and has the domain


of conceptual development incorporated as a module (like the model by Hill, 1983). It is possible to turn off or slow the function of the module (i.e., forming concepts) to create an artificial developmental delay in cognition. Under such a condition, how does the computer perform as compared to the performance in the ‘normal’ condition? Do the input sentences having the features of motherese (see Snow, 1977) facilitate language acquisition differently under the ‘normal’ and ‘delayed’ conditions? There are many structural and functional manipulations, beside those just mentioned, that can be done without mercy to a computer program but not to human beings.

To conclude, we must recognize that computer models are a promising supportive tool for studying children’s language acquisition. A theory-based model can verify the learnability condition of a language acquisition theory. The theory can be further tested against children’s observational data and data-driven computer models for its adequacy as an account for the developmental course of language acquisition. Once this is achieved, the model can be subjected to a learning task in areas other than language. Depending on whether or not the system can handle the problem without a drastic alteration, the model can provide us with an answer to the question: does language acquisition require domain-specific processes or not? We do not know if computer model approaches in the study of language acquisition can go this far. But considering the potential it has, it seems well worth working for its advancement.

REFERENCES

Anderson, J.R. (1981). A theory of language acquisition based on general learning principles. Pro- ceedings of the Seventh International Joint Confeence on Artifiial Intelligence (pp. 97-103). University of

British Columbia, Vancouver, BC, Canada. Anderson, J.R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press.

Bates, E., & MacWhinney, B. (1987). Competition, variation, and language learning. In B. Mac-

Whinney (Ed.), Mechanisms of language acquisition (pp. 157-193). Hillsdale, NJ: Erlbaum. Berwick, R.C., & Weinberg, A.S. (1984). Thegrammatical basis of linguistic performance: Language use

and acquisition. Cambridge, MA: The MIT Press. Block, H.D., Moulton, J., & Robinson, G.M. (1975). Natural language acquisition by a robot. Inter-

national Journal of Man-Machine Studies, 7, 571-608. Bowerman, M. (1986). What shapes children’s grammars? In D.I. Slobin (Ed.), The crosslinguistic

study of language acquisition: Vol. 2. Theoretical issues (pp. 1257-1319). Hillsdale, NJ: Erlbaum. Braine, M.D.S. (1971). On two types of models of the internalization of grammar. In D.I. Slobin

(Ed.), The ontogenesis of grammar: A theoretical symposium (pp. 153-186). New York: Academic Press.

Bresnan, J., Kaplan, R., Peters, S., & Zaenen, A. (1982). C ross-social dependencies in Dutch. Lin-

guistic Inquiry, 13, 613-635. Brown, R., & Hanlon, C. (1970). D erivational complexity and order of acquisition in child speech.

In J.R. Hayes (Ed.), Cognition and the development of language (pp. 11-53). New York: Wiley.

Fodor, J.A. (1983). Modularity of mind. Cambridge, MA: The MIT Press. Gasser, M. (1985). Second language production: Coping with gaps in linguistic knowledge (Tech. Rep.

UCLA-AI-85-18). Los Angeles: Computer Science Department, University of California.

Gazdar, G. (1982). Phrase structure grammar. In P. Jacobson & G.K. Pullum (Eds.), The nature of syntactic representation (pp. 13 1-186). Dordrecht: Reidel.

Gazdar, G., Klein, E., Pullum, G., & Sag, I. (1985). G eneralized phrase structure grammar. Cambridge,

MA: Harvard University Press. Halliday, M.A.K. (1975). Learning how to mean. London: Edward Arnold. Harris, L.R. (1977). A system for primitive natural language acquisition. International Journal of Man-

Machine Studies, 9, 153-206.

144 Morikawa

Harris, Z.S. (1964). Distributional structure. In J.A. Fodor, &J. J. Katz (Eds.), The stnccture of language: Readings in the Philosokhy of language (pp. 33-49). Englewood Cliffs, NJ: Prentice-Hall.

Hedrick, C.L. (1976). Learning production systems from examples. Artificial Zntellz@nce, 7, 21-49. Hill, J.A.C. (1983). A computational model of language acquisition in the two-year-old. Unpublished doc-

toral dissertation, University of Massachusetts. Hinton, G.E. (1985). Learning in parallel networks. BYTE, 10(4), 265-273. Hoff-Ginsberg, E., & Shatz, M. (1982). Linguistic input and the child’s acquisition of language. Psy-

chological Bulletin, 92, 3-26. Kelley, K. (1967). Early syntaGtic acquisition (Report No. P-3719). Santa Monica, CA: The Rand

Corporation. Klein, S. & Kuppin, M. (1970). An interactive program for learning transformational grammars.

Computer Studies in the Humanities and Verbal Behavior, 3, 144-162. Langley, P. (1982). Language acquisition through error recovery. Cognition and Brain Theory, 5,

211-255. Langley, P., & Carbonell, J.G. (1987). Language acquisition and machine learning. In B. Mac-

Whinney (Ed.), Mechanisms of language acquisition (pp. 115-155). Hillsdale, NJ: Erlbaum.

Lehman, R.S. (1977). Computer simulation and modeling: An introduction. Hillsdale, NJ: Erlbaum.

Levy, Y. (1983). It’s frogs all the way down. Cognition, 15, 75-93. Liebhaber, M. (1987). Creation and modification of word-specific paradigms: A computer model.

Working Papers in Language Develojnnent, 2(2), 172-187. Lawrence, KS: University of Kansas. MacWhinney, B. (1986). Competition and language acquisition theory. Lecture handout. Teachability of

Language Conference, Kansas City, MO. MacWhinney, B. (1987). The competition model. In B. MacWhinney (Ed.), Mechanisms oflanguage

acquisition (pp. 249-308). Hillsdale, NJ: Erlbaum. Marantz, A. (1982). On the acquisition of grammatical relations. Linguistische Beriihte, 80, 32-69. Maratsos, M. (1982). The child’s construction of grammatical categories. In E. Wanner & L.R.

Gleitman (Eds.), Languuge acquisition: The state of the art (pp. 248-266). Cambridge, MA: Cambridge University Press.

McMaster, I., Sampson, J.R., & King, J.E. (1976). Computer acquisition of natural language: A review and prospectus. International Journal of Man-Machine Studies, 8, 367-396.

Moulton, J., & Robinson, G.M. (1981). The organization of language. Cambridge, MA: Cambridge University Press.

Peters, A.M. (1983). The units of language acquisition. Cambridge, MA: Cambridge University Press. Peters, S., & Ritchie, R. (1973). On the generative power of transformational grammars. Znforma-

tion Science, 6, 49-83. Pinker, S. (1979). Formal models of language learning. Cognition, 7(3), 217-284. Pinker, S. (1984). Language learnability and language development. Cambridge, MA: Harvard Univer-

sity Press. Pinker, S. (1985). Language learnability and children’s language: A multifaceted approach. In K.E.

Nelson (Ed.), Children’s language (Vol. 5, pp. 399-442). Hillsdale, NJ: Erlbaum. Pinker, S. (in press). Resolving a learnability paradox in the acquisition of the verb lexicon. To

appear in R. Schiefelbusch (Ed.), The teacheability of language. Pinker, S., & Prince, A. (1987). 0 n an ua 1 g g e and connectionism: Analysis of a parallel distributed process-

ing model of language acquisition (Occasional Paper No. 33). C ambridge, MA: Massachusetts Institute of Technology, the Center for Cognitive Science.

Pye, C. (1986). Assessing current models of syntactic acquisition: What does the child need to acquire language? Unpublished manuscript, University of Kansas, Lawrence, KS.

Reeker, L.H. (1976). The computational study of language acquisition. In M. Rubinoff & M.C. Yovits (Eds.), Advances in computers (Vol. 15, pp. 181-237). New York: Academic Press.

Rumelhart, D.E., & McClelland, J.L. (1986). 0 n 1 earning the past tenses of English verbs. In J.L. McClelland & D.E. Rumelhart (Eds.), Parallel distributed processing: Psychological and biological models (Vol. 2, pp. 216-271). Cambridge, MA: The MIT Press.

Rumelhart, D.E., & McClelland, J.L. (1987). L earning the past tenses of English verbs: Implicit rules or parallel distributed processing? In B. MacWhinney (Ed.), Mechanisms of language acquisition (pp. 195-248). Hillsdale, NJ: Erlbaum.

Schank, R.C., & Abelson, R.P. (1977). Scripts, plans, goals and understanding. New York: Halsted

Press. Schwartz, R.M. (1967). Steps towards a model of linguistic performance: A preliminary sketch.

Mechanical Translation, IO, 39-52.

Language acquisition m0aW-s 145

Selfridge, M. (1980). A process moo!el of language acquisition. (Research Rep. No. 172). New Haven, CT: Yale University, Computer Science Department.

Selfridge, M. (1981). A computer model of child language acquisition. Proceedings of the Seventh Znter- national Joint Confcence on Artificial Zntelligcnce (pp. 92-96). Vancouver, British Columbia, Canada: University of British Columbia.

Selfridge, M. (1986). A computer model of child language learning. Artt$i&l Zntell&nce, 29, 17 l-216. Sembugamoorthy, V. (1979). PLAS, a Paradigmatic Language Acquisition System: An overview.

Proceedings of the Sixth International Joint Conference on Artificial Zntclligmce (pp. 788-790). Tokyo. Siklossy, L. (1971). A language-learning heuristic program. Cognitive Psychology, 2, 479-495. Siklossy, L. (1972). Natural language learning by computer. In H.A. Simon & L. Sikldssy (Eds.),

Representation and meanins: Experiments with information processing systems (pp. 288-328). Englewood Cliffs, NJ: Prentice-Hall.

Slobin, D.I. (1986). Crosslinguistic evidence for the language-making capacity. In D.I. Slobm (Ed.), The crosslinguistic study oflanpuase acquisition: Vol. 2. Theoretical issues (pp. 1157-1256). Hillsdale, NJ: Erlbaum.

Snow, C.E. (1977). Mothers’ speech research: From input to interaction. In C.E. Snow & C.A. Fer- guson (Eds.), Talking to children: Language input and aGgutiition (pp. 31-49). Cambridge: Cambridge University Press.

Walsh, R.W. (1981). A computer model for the acquisition of lexical interpretive grammar. Unpublished bachelor’s thesis, Harvard University, Cambridge, MA.

Wexler, K., & Culicover, P. (1980). F ormal prirua$es of language acquisition. Cambridge, MA: The MIT Press.

Winograd, T. (1983). Language as a cognitive process: Vol. 1. Syntax. Reading, MA: Addison-Wesley.

Documents

Computer models of language acquisition