6
Computers and the Humanities, Vol. 10, pp. 275-280. PERGAMON PRESS, 1976. Printed in the U.S.A. Current Scandinavian Computer-Assisted Language and Literature Research ARNE ZETTERSTEN IN my previous articles in CHum 1 on computing activity in language and literature research in Scan- dinavia, I described a number of projects which had been initiated fairly recently or had not in all cases reached the publication stage. The results of some of these projects will be presented below together with the plans for certain new projects that I was not able to deal with last time. Scandinavian languages A notable event was the creation of the first chair and the first Department of Computational Linguistics in Sweden at the University of Gothenburg in 1972. The holder of this chair and head of this department is Sture All6n, who is also the editor of Data linguistica, a series of publications in computational linguistics. 2 His large-scale frequency dictionary of modern Swedish, Nusvensk frekvensordbok (NFO), is based on a corpus of one million running words taken exclusively from Swedish morning papers. Of the three volumes published so far, the first contains graphic words and homograph components, the secor~d inflectional forms and variants under lemma entries, 3 and the third published in 1975, combina- tions of words, constructions, and idioms. A fourth volume, to be published in 1977, will contain an analysis of morphemes and meanings. Various other current activities initiated by the department can also be mentioned. The project "Algorithmic Text Analysis" is intended to elaborate and apply formalized procedures for morphological and syntactic analysis of authentic text. A newly worked out inflectional grammar of Swedish contains in its present shape 235 different paradigms. In accordance with this grammar a dictionary com- prising the basic vocabulary of NFO 2 has been made accessible to the computer. Under way is a flexible system of programs for morphological and syntactic analysis, drawing to a very large extent on the lexicographic results of the department. For a min- imal list of lexemes that enter into the definition of many other lexemes, another project, "Swedish Defining Vocabulary," starts from Michael West's An International Reader's Dictionary. In the project "Swedish Proper Names" the Christian names and surnames of all Swedish citizens are investigated. Finally, the Dictionary of Swedish Homographs, to be published by Sture Berg, comprises about 6,000 homographs in Swedish. The external service of the department was institutionalized in 1975, when the Swedish Logoth6que was established as a national service organ to collect and store machine-readable texts, to supply texts and processed material of various kinds, and to build up a Swedish word bank. The Logoth~que, including a linguistic consultant, Martin Gellerstam, and a data-processing supervisor, Rolf Gavare, assists numerous scholars and projects in Sweden. The machine-readable material collected so far- Swedish and foreign literary texts, newspaper text and various lexical treatments - will be presented together with the annual Sprhkdata Research Report at the end of the year. The recent work of the department has been presented in two papers by Sture All6n: "Text-Based Lexicography and Algorithmic Text Analysis" (Pre- print for the 6th International Conference in Com- putational Linguistics, Ottawa 1976) and "The Swedish Logoth~que: A Computer-Based Text and Word Bank" (to appear in Festschrift F. de Tol- lenaere). A project called "Speech Syntax," headed by Bengt Loman, Lund, briefly mentioned in my last Arne Zettersten is a professor of English language and literature at the University of Copenhagen. 275

Current Scandinavian computer-assisted language and literature research

Embed Size (px)

Citation preview

Page 1: Current Scandinavian computer-assisted language and literature research

Computers and the Humanities, Vol . 10 , pp. 2 7 5 - 2 8 0 . P E R G A M O N P R E S S , 1 9 7 6 . P r i n t e d in t he U .S .A.

Current Scandinavian Computer-Assisted Language and Literature Research

ARNE ZETTERSTEN

IN my previous articles in CHum 1 on computing activity in language and literature research in Scan- dinavia, I described a number of projects which had been initiated fairly recently or had not in all cases reached the publication stage. The results of some of these projects will be presented below together with the plans for certain new projects that I was not able to deal with last time.

Scandinavian languages

A notable event was the creation of the first chair and the first Department of Computational Linguistics in Sweden at the University of Gothenburg in 1972. The holder of this chair and head of this department is Sture All6n, who is also the editor of Data linguistica, a series of publications in computational linguistics. 2 His large-scale frequency dictionary of modern Swedish, Nusvensk frekvensordbok (NFO), is based on a corpus of one million running words taken exclusively from Swedish morning papers. Of the three volumes published so far, the first contains graphic words and homograph components, the secor~d inflectional forms and variants under lemma entries, 3 and the third published in 1975, combina- tions of words, constructions, and idioms. A fourth volume, to be published in 1977, will contain an analysis of morphemes and meanings.

Various other current activities initiated by the department can also be mentioned. The project "Algorithmic Text Analysis" is intended to elaborate and apply formalized procedures for morphological and syntactic analysis of authentic text. A newly worked out inflectional grammar of Swedish contains in its present shape 235 different paradigms. In accordance with this grammar a dictionary com- prising the basic vocabulary of NFO 2 has been made

accessible to the computer. Under way is a flexible system of programs for morphological and syntactic analysis, drawing to a very large extent on the lexicographic results of the department. For a min- imal list of lexemes that enter into the definition of many other lexemes, another project, "Swedish Defining Vocabulary," starts from Michael West's An International Reader's Dictionary. In the project "Swedish Proper Names" the Christian names and surnames of all Swedish citizens are investigated. Finally, the Dictionary of Swedish Homographs, to be published by Sture Berg, comprises about 6,000 homographs in Swedish.

The external service of the department was institutionalized in 1975, when the Swedish Logoth6que was established as a national service organ to collect and store machine-readable texts, to supply texts and processed material of various kinds, and to build up a Swedish word bank. The Logoth~que, including a linguistic consultant, Martin Gellerstam, and a data-processing supervisor, Rolf Gavare, assists numerous scholars and projects in Sweden. The machine-readable material collected so f a r - Swedish and foreign literary texts, newspaper text and various lexical treatments - will be presented together with the annual Sprhkdata Research Report at the end of the year.

The recent work of the department has been presented in two papers by Sture All6n: "Text-Based Lexicography and Algorithmic Text Analysis" (Pre- print for the 6th International Conference in Com- putational Linguistics, Ottawa 1976) and "The Swedish Logoth~que: A Computer-Based Text and Word Bank" (to appear in Festschrift F. de Tol- lenaere).

A project called "Speech Syntax," headed by Bengt Loman, Lund, briefly mentioned in my last

Arne Zettersten is a professor o f English language and literature at the University o f Copenhagen.

2 7 5

Page 2: Current Scandinavian computer-assisted language and literature research

276 ARNE ZETTERSTEN

report, has since developed considerably. Various published results have dealt with both spoken and written syntax. The research material used by this research group in Lund, now stored at a project called "Talbanken" for use primarily in studies of modern spoken Swedish, consists of the following parts: (a) tape-recordings of modern spoken Swedish with particular attention given to social, regional and contextual variation; (b) phonetic transcriptions of these recordings; (c) magnetic tapes of detailed analyses; and (d) magnetic tapes of texts with lexical and syntactic information on the sentence structures of the texts. Among recent publications based on this data, Margareta Westman's dissertation Bruksprosa, En funktionell stilanalys med Icvantitativ metod (1975), is a quantitative investigation of the styles of four different categories of informative everyday written Swedish: informative booklets on social services, informative newspaper articles, text- books on general school subjects, and argumentative articles from books and magazines. Another recent dissertation is Nils JOrgensen's Meningsbyggnaden i talad svenska (1976), published in the series Lunda- studier i nordisk spr~tkvetenskap. The purpose of his investigation is to describe the sentence structure of spoken Swedish on the basis of recordings of groups such as teachers, lawyers, dentists, factory workers, and radio commentators, and to examine the way in which syntactic differences correlate with differences in the social background of each speaker.

The following articles based on the Lund material were also published in 1976: Jan Einarsson, "Nominalfras, socialgrupp och k6n," in Loman, Spr~k ock samhiille, 3, and "M/in, kvinnor och sprSk," in Festskrift till GOsta Holm, and Nils J~rgensen, "De r/ittkonstruerade meningarnas byggnad i ett socialt skiktat talsprSksmaterial," in Loman, Spr~ak ock samha'lle, 3. At the University of Helsinki, Finland, Mirja Saari who is associated with the Lund group has finished her dissertation entitled Talsvenska (1975). It is a sociolinguistic study of syntactic features in interview answers from BorSs, Sweden; Helsinki, Finland; and Tornedalen on the border of Sweden and Finland. Also, Helena Sol- strand, of Helsinki, has published a work-index to the poems of J. L. Runeberg (1974).

The KVAL group in Stockholm, headed by Hans Karlgren, has continued its high publication rate in a series of articles in Statistical Methods in Linguistics (SMIL) and other reports (KVAL, PM's and Interim reports).

Roger House has continued his work on KVAPT, a computerized system to translate Swedish text into

Swedish Braille (KVAL, Interim Reports, Nos. 16-18, 1970). A description of these programs is contained in Roger House's "Description of Basic KVAPT" (Interim report No. 22, 1970), "Description of KVAPT 1" (Interim report No. 25, 1970), and "Description of KVAPT 2" (Interim report No. 27, 1970); and in M/irit Magnusson's "Description of KVAPT 3" (Interim report No. 35, 1971). The whole Braille project was summarized by Roger House in "Computerized Braille Translation and Production in Sweden" (Interim report No. 28, 1970).

The KVAL group has also stored on magnetic tape the 9th edition of the Word-List of the Swedish Acad- emy, a project described by Erik Kristensen in "SAOL:

0 0 ~ Svenska Akademiens Ord-Lista pa Magnetband (In- terim report Nos. 24, 33-34). The 10th edition of this word list has been stored by All4n's group in Gothen- burg. A number of articles in SMIL are also com- puter-based, e.g., GiSran Engstr6m's "Automatic phonemization in practice," SMIL 8, 1972, pp. 39-55. This paper discusses the attempts of the KVAL group to construct a computerized system for checking new trademarks against the register of the Swedish Patent Office. Another Stockholm investigation, Carita Hass- ler-G6ransson's Fyrtio f6rfattare i statistisk belysning (1976), studies differences in the word-frequency dis- tribution between the sexes and between different time periods. The corpus consists of samples from novels by forty Swedish writers, every decade of the century 1880-1979 being represented by two male and two female writers.

At the Department of Speech Communication, Royal Institute of Technology, Stockholm, headed by Gunnar Fant, a great number of computer-based studies have been summarized and listed in yearly summary reports called "Speech Research," covering the years 1971-75. The research centres on such speech areas as analysis, production, perception, and synthesis, as well as speech and hearing defects and aids, and musical acoustics. The latest computer- based studies are represented by J. Liljencrants and G. Fant, "Computer program for VT-resonance fre- quency calculations" (STL-QPSR 4, 1975), in which two programs for deriving vocal-tract resonance fre- quencies from area. functions are presented; B. Lind- blom and J. Sundberg, "Datorkomponerad musik," Matematik och datorer i spr~tk-, tal- och Musik- forskning (1975), pp. 61-82; and R. Carlsson and B. Granstr6m, "A text-to-speech system based on a phonetically oriented programming language" (STL- QPSR 1, 1975). 4

The use of computers in phonetic research in Sweden has been discussed by Claes-Christian Elert,

Page 3: Current Scandinavian computer-assisted language and literature research

CURRENT SCANDINAVIAN COMPUTER-ASSISTED LANGUAGE AND LITERATURE RESEARCH 2 7 7

University of UmeA, in "The small computer in the phonetics laboratory," World Papers in Phonetics (Tokyo, 1974), pp. 145-62. In addition to program- ruing, the areas discussed are speech analysis and synthesis, experimental work, dialectology, phon- ology, and the teaching of phonetics. Of the research work discussed, the phonological studies of J. Lil- jencrants and B. Lindblom were published in "Numerical simulation of vowel quality systems: the role of perceptual contrast," Language, 48 (1972), 839-62.

In 1971 Jan Thavenius, of Lund, published a concordance of the poetry of the Swedish poet Hjalmar Gullberg; he is now working on a concor- dance of Esaias Tegn~r.

His dissertation, Stil och vokabuliir, published in 1972, is a quantitative study of Swedish structure words and their importance in discussions of style. The distribution of various groups of structure words, such as pronouns, prepositions and conjunctions, was analyzed in four corpora: Swedish poetry from 1760 to 1825, twentieth-century poetry including that of Hjalmar Gullberg, prose from 1890 to 1950, and newspaper prose from 1965.

Several computer-based projects at the University of Copenhagen relate to language analysis. In the department of Nordic Philology, Hanne Ruus has started automatic lemmatizing and syntactic analysis of modern Danish. Her syntactic pilot study, called "Forsfig reed datamatisk saetningsanalyse af moderne dansk," was published in SAML I (Institute for ap- plied and mathematical linguistics, 1974), pp. 33-41. With Bente Maegaard, she has also started the project "Word Frequencies in Modern Danish," aimed at producing a frequency list of modern Danish from a corpus containing 1.25 million running words. At the Institute of Danish Dialect Research, Karen Margarethe Pedersen heads a project for making an extensive data bank of transcribed recordings of Danish dialects. Details of the project have been described in K. M. Pedersen, "Dialekttekster i rigsmSlsnotation reed decifrering," Danske Folkemaal 20 (1974), 29-46, and B. J. Nielsen - K. M. Pedersen, "Rapport om EDB-analyse af indkodede dialekt- tekster," Danske Folkemaa120, 118-34. And Elisabeth Hansen, at the Danish Institute for Educational Research, Copenhagen, has worked on a formal analysis of the spoken language of children. Her book, Syntaksen i bfSrnesprog, was published in Fagligpaedagogiske smaskrifter om dansk sprog, No. 4 (1975).

In Norway the Project for Automatic Language Processing at the Department of Nordic Studies,

University of Bergen, led by Kolbj~rn Heggstad, has built up a library of machine-readable Norwegian texts and a word bank covering the two variants of modern Norwegian, bokmaal and nynorsk. The pro- ject, developed considerably since the late 1960's, has also started research on technical language, new acquisitions in Modern Norwegian, frequency word- lists, concordances, Norwegian personal names, place names, literary criticism in Norwegian newspapers, etc. The project has recently been described by Kolbjlbrn Heggstad in Datamaskinell sprhkbehandling PDS 1967-1976, published in 1976 by the Depart- ment of Nordic Studies, University of Bergen. The project edits two series of publications, Norske sprakdata s and a series of working reports.

Several other lexicographical projects have been started in Norway, for example at the Norwegian Institute of Lexicography, University of Oslo. Some of these projects have recently been described briefly by J. H. Hauge in "A survey of EDP projects in linguistics and literary studies at Norwegian univer- sities," ALLCBulletin 3:3 (1975), 208-10.

The English language

In recent years at the University of Stockholm, a number of computer-based studies on the English language have been completed or are in progre, ss. Sven Jacobson has completed an extensive computational- transformational study of adverbs, called Factors Influencing the Placement of English Adverbs in Relation to Auxiliaries: A Study in Variation (1976). This investigation, based on c. 25,000 examples from a large number of written and spoken texts of American English, discusses a variety of possible adverb positions. The end of the book presents a method which makes it possible to use the results of the analysis to estimate the probability of the various positions in new sentences. An attempt is also made to formulate a generative variable rule on the basis of such probability estimations. Stieg Hargevik also used computational methods for The Disputed assignment of Memoirs of an English Officer to Daniel Defoe (2 vols., 1974). The Swedish Question-Answering Proj- ect (SQAP), has developed a question-answering system in English, as described by Jacob Palme in "The SQAP Data Base for Natural Language Informa- tion," American Journal of Computational Linguis- tics, Microfiche 24 (1975). 6

At the University of Gothenburg, G6ran Kjellmer has completed an investigation of the restructuring of vocabulary that occurred in the Middle English period. His book, called Middle English Words for

Page 4: Current Scandinavian computer-assisted language and literature research

2 7 8 A R N E Z E T T E R S T E N

'People' (1973), analyzed a corpus of c. 13,000 instances of the words FOLK, LEOD, MAN, NA- TION, PEOPLE, and ])EOD in Middle English, repre- senting the meaning expressed by Modern English people, the same linguistic source material as was used for different purposes in his ttresis Context and Meaning (1971). Kjellmer is now working on a phrase dictionary based on the Brown University Corpus, which is also the basis for Alvar EllegSrd, of Gothen- burg, who is the head of a project called "Syntax- data."

At Lund, Jan Svartvik directs a project called "Survey of Spoken English," which is based on the material o f the Survey of English Usage, University College, London. This transcribed material of spoken English will be stored on magnetic tape and further analyzed in Lund.

The project "English Name-Studies" led by Gillis Kristensson aims at making an inventory of Old and Middle English personal names, mainly for Middle English investigations: a survey of Christian names and by-names; a survey of topographical terms; and a survey of dialects 1290-1350. The project has been described by G. Kristensson in "Databehandling av personnamnsmaterial," Sydsvenska ortnamnssdllska- pets ~rsskrlft (1975) and in "Computer processing of Middle English personal-name materials," The Study of the Personal Names of the British Isles, Proceedings of a Working Conference at Erlangen 21-24 September 1975, ed. H. Voitl (1976), pp. 62-74. Christer P~lsson also used computers for his statistical calculations in his sociolinguistic study of the so-called Northumbrian burr. His thesis, which was published in Lund Studies in English, 41 (1972), describes the Northumbrian burr synchronically within its social context. At the University of Ume~ Carl-Gustaf S6derberg, Department of Phonetics, has made a computerized analysis of initially consonantal trisyllables in present-day Received Pronunciation of English. The results were published in A Contribution to the Study of the Typology of Present-Day RP- English, Publication of the Department of Phonetics, Ume5 University, No. 7 (1975).

A number of English-language projects in other Scandinavian countries can be reported on. A Danish-English Project on Error Analysis and Contras- tire Linguistics, headed by the present writer, has been started at the English Department, University of Copenhagen, in collaboration with other Danish universities, teacher training colleges, and schools. The aim of the project is to analyze errors of various sorts on various school and university levels and to develop the methods of teaching English as a second

language in Denmark. Einar Bjbrvand, English Depart- ment, University of Olso, has published A Concor- dance to Spenser's Fowre Hymnes (Oslo, 1973). At the same department work has started on concor- dances of Aelfric's Lives of the Saints and Catholic Homilics. In Turku, Finland, Marita Gustaffsson's thesis, Binomial Expressions in Present-day English: A Syntactic and Semantic Study (1975), is an attempt to give a comprehensive description of the syntactic and semantic functions of binomials in present-day English. The base material consists of novels, newspapers and magazines, popular scientific treatises, and laws and acts.

At Abo Akademi, in the Text Linguistic Research Group headed by Nils-Erik Enkvist, Auli Hakulinen and Viljo Kohonen have used computers for fre- quency counts.

The Dictionary of Early Modern English Pronun- ciation 1500-1800 (DEMEP) is a joint Scandinavian- German-British-American project initiated by Bror Danielsson, Stockholm. This six-volume dictionary, an attempt to provide a detailed history of English pronunciation from the invention of printing to c. 1800, will record every attested pronunciation of English Words found in grammatical, ortho~pical, or phonetic works. The dictionary will consist of the following parts:

Vol. I: 1500-1650. Editors: B. Danielsson, Stockholm, and A. Zettersten, Copenhagen. Vol. II: 1650-1700. Editor: B. Sundby, Bergen. Vol. III: 1700-1750. Editor: Klaus Dietz, Bonn. Vol. IV: 1750-1800. Editors: Horst Weinstock, Aachen, and D. Sherman, Berkley. Vols. V-VI: Accounts of sources, biographical and bibliographical data, appendices with com- puterized extracts from volumes I-IV, such as reverse dictionary, place-names, variant spel- lings, rime words, etc.

For a discussion of the project, see B. Danielsson, "Proposal for DEMEP: a Dictionary of Early Modern English Pronunciation 1500-1800," Neuphilologische Mitteilungen, 75,3 (1974), 492-500.

Tile German language

The first part of Inger Rosengren's frequency dic- tionary of modern German newspaper language, published in 1972 as Ein Frequenzw6rterbuch der deutschen Zeitungssprache. Die Welt, Siiddeutsche Zeitung. I (Lund, 1972), covers representative sam- pies of Die Welt (2,476,560 running words) and Siiddeutsche Zeitung (500,334 running words) from 1 November, 1966, to 30 October, 1967. From Die

Page 5: Current Scandinavian computer-assisted language and literature research

CURRENT SCANDINAVIAN COMPUTER-ASSISTED LANGUAGE AND LITERATURE RESEARCH 279

Welt, 5,534 articles are drawn from such genres as editorials, politics, cultural matters, economics, and general news, and 1,061 from Siiddeutsche Zeitung. Part I contains the material based on word forms, whereas Part II (in press) will be based on lemmas. Several studies and theses based on the same material are being planned at the German Department, Lund University. So far, the only one published is Ingemar Persson's thesis, Das System der kausativen Funk- tionsverbgefiige, Lunder germanistische Forschungen, 42 (1976). At the Malta6 School of Education, Bengt Nilsson has started a project, the aim of which is to study grammatical structure frequencies in Swedish textbooks for teaching German to Swedish students.

Karl Hyldgaard-Jensen, Institute for German Phil- ology, University of Copenhagen, has continued his projects on Middle Low German and Modern Dutch. In 1975 one volume of the Low German project was published as well as the first volume of the Dutch- Danish Dictionary, of which the second volume came out in 1976. Jfirgen MNgaard of the same project has made a critical investigation of the five primers in German used at Danish schools, published in Kopen- hagener Beitriige zur germanistischen Linguistik, 5 (1974).

A Danish-German Contrastive Project has also been started by Karl Hyldgaard-Jensen. For the purpose of writing a Danish-German Contrastive Grammar a corpus of modern German is being prepared in collaboration with the Institut ffir deutsche Sprache in Bonn und Mannheim. The aim is to compare Danish with German by using a corpus of Danish parallel to the one prepared in Germany.

The French language

Gunnel Engwall's thesis, FrOquence et distribution du vocabulaire clans un choix de romans franfais (Stock- holm, 1974), is a word-frequency study based on twenty-f ive French novels from the period 1962-1968. The programming was described in her "A Concordance program for linguistic and literary research," Computer Center, Royal College of For- estry, Report No. 8 (Stockholm, 1972). In 1974, Inger-Britt Robach, Lund, published a thesis called Etude socio-linguistique de la segmentation syn- taxique du franc ais parld (Etudes Romanes de Lurid, 23). This sociolinguistic study was based on a magnetic tape provided by the Language Research Center, Birkbeck College, London, which consisted of interviews from the French-English "projet d'Orl6ans." The author used 36 interviews for her syntactic study and distinguished in her analyses

between three social groups, three age groups, and the two sexes.

A project to analyze word order in Middle French and Renaissance French prose has been started by Suzanne Hanon at the Department of Romance Philology, University of Odense, Denmark. One pub- lished concordance of the texts on magnetic tape is of Joachim du Bellay's La Defiance et Illustration de la langue francoyse, published by Odense University Press in 1974. Suzanne Hanon has also taken an interest in anglicisms in contemporary French. Her methods for searching certain elements or sequences of elements to identify English loan-words were described in an unpublished thesis Anglicismes en franfais contemporain. M~thodes et problemes (Aarhus, 1970) and in "The Study of English Loan- Words in Modern French," CHum, 7,6 (Sept.- Nov. 1973), 389-98.

Bente Maegaard and Ebbe Spang-Hanssen, CoPen- hagen, run a joint project on the automatic identifi- cation of various word-classes in French texts. See B. Maegaard, "The recognition of finite verbs in French texts," ALLC Bulletin, 4,1 (1976), 49-52. Another project deals with automatic segmentation of French sentences into main clauses and subclauses. See Ebbe Spang-Hanssen, "La segmentation automatique de textes fran9ais: Quelques experiences," Annales Universitatis Turkuensis, B 127 (1973), and Bente Maegaard and Ebbe Spang-Hanssen, "Segmentation of French sentences," Mathematical and Computational Linguistics, II, ed. A. Zampolli (Firenze, 1974).

Other languages

At the University of Turku, Osmo lkola has con- tinued his syntactic studies of Finnish dialects in collaboration with 011i J~irvikoski, Kirsti Siitonen, and Jussi Salmela. The main object of their project is to collect syntactic material for the construction of a syntax archive of Finnish dialects. 7 The computer- assisted work on the acquisition of Russian vocabu- lary at the Department of Slavic Languages, Univer- sity of Stockholm, was discussed by Anna-Lena S~gvall, Berith Br~irmstr6m, and Agneta Berghem in "MIR: A Computer Based Approach to the Acquisi- tion of Russian Vocabulary in Context," System, 4:2 (1976), pp. 116-27. At the Department of Slavic and Baltic Studies, University of Oslo, Russian Poetry of the nineteenth and twentieth centuries has been put into machine-readable form to support studies of the vocabulary and style of authors such as Lermontov, Achmatova, and Tujut6ev. Between 1971 and 1974, Henrik Holmboe, Department of Linguistics, Univer-

Page 6: Current Scandinavian computer-assisted language and literature research

280 ARNE ZETTERSTEN

sity of Aarhus, published concordances of seven tragedies of Aeschylus. He is now working on frequency word lists for Hungarian and for the Eskimo language spoken on Greenland.

A research project called "Papyri from Hercu- laneum" was started in 1971 at the Department of Classics, University of Bergen, where a concordance of the works of Philodem also is in preparation. The Department of Religion's authorship studies of Philo of Alexandria were discussed by Peder Borgen and Roald Skarsten in "Bibelvitenskap, gresk og EDB: Maskinleselig tekst og indeks av Philo fra Alexandrias skrifter," Forskningsnytt, 16 (1971), 37-39, 50.

At the Scandinavian Institute of Asian Studies in Copenhagen, Eric Grinstead has started two projects. The aim of the first is to produce a dictionary based on 125,000 Tamil words in roman transcription. The second project is concerned with the Buddhist Canon, of which the Lotus Sutra has been stored on magnetic tape in the Sanskrit as well as the Chinese version.

NOTES

1. CHum 3, 1 (September 1968), 53-60; 5, 4 (March 1971) 203-208.

2. The following volumes of Data linguistiea have appeared: Sture All,n, Frequency Dictionary of Present-Day Swedish Based on Newspaper Material. 1: Graphic Words, Homograph Components. 1970. Sture Allgn, with Staffan Hellberg, Introduction to Gmphonomy, the Linguistic Study o f Writing. 1971. Jan Thavenius, A Concordance to the Poems of Hjalmar Gullberg. 1971. Sture All~n, Frequency Dictionary of Present.Day Swedish Based on Newspaper Material 2: Lemmas. 1971. Rolf Gavare, Graph Description of Linguistic Struc- tures. A Linguistic Approach to the Theory of Graphs. 1972. Sture All,n, Top Ten Thousand Word Frequencies in Swedish Newspaper Text. 19"72. Bo Ralph, Introduction to Historical Linguistics. 1972.

Anna-Lena S~gvall, A System for Automatic Inflec- tional Analysis Implemented for Russian. 1973. Magnus Ljung, A Frequency Dictionary of English Morphemes. 1974. Sture Allgn, et al. Frequency Dictionary of Present- Day Swedish Based on Newspaper Material. 3: Collo- cations. 1975.

3. On lemmatization, see Staffan Hellberg, "Computerized Lemmatization without the Use of a Dictionary: A Case Study from Swedish Lexicology," CHum, 6, 4 March 1972/209-12.

4. The following studies from the same department are also computer-based: M. Blomberg and K. Elenius, "Tw~ f6rgbk meal automatisk taligenkffnning," STL - TR -- 1974 -- 1, Jan. 15 (Technical report); G. Jismalm, "Talad kommunikation reed datorer," STL - TR - 1973 -- 3, Dec. 1974 (Technical report and R. Carlson and B. Granstr6m, "A phonetically oriented programming lan- guage for rule description of speech," Speech Communi- cation, 2 (1975), 245-53.

5. The following issues of Norske spr~kdata have been published so far:

Norsk grunnvokabular, 1-4. Report 1. 1971. Nynorsk balclengsordliste. Report 2. 1972. Nyn orsk ba Mengsordliste. Re port 2 ]75. 1976. Bokm~l baklengsordliste. Report 3. 1972. Bokm~l baMengsordliste. Report 3/7 S. 1975. En datamasMnell unders(Jkelse av suffiksvekslingen -ing/-ning i moderne norslc. By Gulbrand Alhaug. Report 4. 1973. Setningsspissanalyse. 1-7. By G. M. Gillow and P. B. Pedersen. Report 5. 1976.

6. Abstract in CHum, 9, 6 (November 1975), 315-16. 7. Osmo Ikola, "Vorbereitungen zur masehinellen syn-

taktischen Analyse der finnischen Mundarten," Abhand- lungen der Ate. der Wissenschaften in GOttingen, Phil,-hist. Klasse, Dritte Folge, Nr. 76(1970), Osmo Ikola and Yrj6 Karjalainen, "Syntax archives of Finnish dialects for computer work ," Proceedings o f the International Con- ference on Computer Linguistics, Pisa (August 1973), Osmo Ikola, "Kieltoa vahvistavat sanat lounaism urteissa," (mit deutscher Zusammenfassung: Verstffrkungsworter tier Vereinung in den s/idwest-finn. Dialekten) Journal de la Socidtd Finno-ougrienne, 72(1973), Olli Jffrvikoski, "Lauseopin arkiston tietokonelingvistiikasta" Seulaset, 1(1974), Jussi Kallio, "Suomen kansankielen lauseopillis- ten ainesten keruu ja arkistointi," Seulaset 1(1969), Y. A. Karjalainen and M. 2. Nurminen, "Report of an Informa- t i o n R e t r i e v a l System for Linguistic Studies," NordDA TA.konferens (1972).

The Machine-Readable Archives Division of the National Archives recently accepted from the Office of the Secretary of Defense magnetic tape files documenting the Vietnam War. The records on combat and on conditions and attitudes of the civilian population, used by military planners and commanders from 1966-1974, include the Hamlet Evaluation System, which measured the security of the South Vietnamese people from Viet Cong and North Vietnamese harassment and exploitation; the Pacification Attitude Analysis System, which documented civilian attitudes toward the Saigon Government; and HERBO-2, an accounting of missions and damage assessments in the Allied herbicide program. The Archives' holdings of computerized records on the war now totals some 19 files on 97 reels of tape.