9
PL/I: A Programming Language for Humanities Research By Jack Heller and George W. Logemann Computer engineers and language designers have built us a Tower of Babel: the engineers add a room each time they design a new machine; the language designs construct a whole story when they unveil a new operating system. There are many reasons for such proliferation: a basic cause is the fact that today there are many computers, each with its own machine language. Likewise, many compiler languages have been developed. At first compiler languages (such as Fortran and Cobol) were developed to overcome the tedium and pitfalls of machine language programming. In addition to being more convenient, compiler languages were soon found to be an ideal means for communication. What better way to describe an algorithm, i.e., a definite computing procedure with no unfore- seen contingencies, than to present the computer program that does the algorithm. Hence Algol was born, to provide both a publication language and a practical language for actual use. Most of the compiler languages were developed to handle programs involving numerical computation: they allow little if any manipulation of alphabetic quantities or strings of symbols. Hence in parallel with compiler language designers, workers in specialized fields developed symbol-manipulating languages for processing literary texts, or manipu- lating [acts opposed to numbers. Notably we have COMIT, developed at MIT for linguistic studies leading to machine translation of human languages; more lately SNOBOL was invented at Bell Laboratories. A variety of information-processing languages have been invented, in which facts are held in lists or more general structures such as trees (resembling sentence diagrams) or graphs (something like airline charts or road maps showing the connections between cities). Such languages that handle lists, graphs, and trees are called list-processing languages; examples are IPL-V, SLIP, NUSPEAK, LISP, as well as SNOBOL and COMIT. More recently, at NYU we have developed SYMAN, a system of subroutines adding to Fortran simply the facility to manipulate strings of symbols and lists of strings. A severe criticism of compiler languages and list-processing lan- guages is that they do not make maximum use of the capacity of the machine. For example, a Fortran program might take even up to fifty percent longer to run to completion than would a machine-language coded version of the same program. In the old days (about ten years ago) running time was measured in hours, so that a loss in efficiency was lack Heller is Acting Director o/ the Institute /or Computer Research in the Huraanitier o/ New York University: George IV. Logemann is Co.ordinator /or Computer Science there. They are joint authors o/ the [orthcoming Computer Methods in the Humanities (McGraw-Hill). 19

PL/I: A programming language for humanities research

Embed Size (px)

Citation preview

Page 1: PL/I: A programming language for humanities research

PL/I: A Programming Language

for Humanities Research By Jack Heller and George W. Logemann

Computer engineers and language designers have built us a Tower of Babel: the engineers add a room each time they design a new machine; the language designs construct a whole story when they unveil a new operating system.

There are many reasons for such proliferation: a basic cause is the fact that today there are many computers, each with its own machine language. Likewise, many compiler languages have been developed. At first compiler languages (such as Fortran and Cobol) were developed to overcome the tedium and pitfalls of machine language programming.

In addition to being more convenient, compiler languages were soon found to be an ideal means for communication. What better way to describe an algorithm, i.e., a definite computing procedure with no unfore- seen contingencies, than to present the computer program that does the algorithm. Hence Algol was born, to provide both a publication language and a practical language for actual use.

Most of the compiler languages were developed to handle programs involving numerical computation: they allow little if any manipulation of alphabetic quantities or strings of symbols. Hence in parallel with compiler language designers, workers in specialized fields developed symbol-manipulating languages for processing literary texts, or manipu- lating [acts opposed to numbers. Notably we have COMIT, developed at MIT for linguistic studies leading to machine translation of human languages; more lately SNOBOL was invented at Bell Laboratories. A variety of information-processing languages have been invented, in which facts are held in lists or more general structures such as trees (resembling sentence diagrams) or graphs (something like airline charts or road maps showing the connections between cities). Such languages that handle lists, graphs, and trees are called list-processing languages; examples are IPL-V, SLIP, NUSPEAK, LISP, as well as SNOBOL and COMIT. More recently, at NYU we have developed SYMAN, a system of subroutines adding to Fortran simply the facility to manipulate strings of symbols and lists of strings.

A severe criticism of compiler languages and list-processing lan- guages is that they do not make maximum use of the capacity of the machine. For example, a Fortran program might take even up to fifty percent longer to run to completion than would a machine-language coded version of the same program. In the old days (about ten years ago) running time was measured in hours, so that a loss in efficiency was

lack Heller is Acting Director o/ the Institute /or Computer Research in the Huraanitier o/ New York University: George IV. Logemann is Co.ordinator /or Computer Science there. They are joint authors o/ the [orthcoming Computer Methods in the Humanities (McGraw-Hill).

19

Page 2: PL/I: A programming language for humanities research

crucial. Also, compiler language programs might use excess memory cells, of which older machines had preciously few.

The same criddsm holds true of list-processing languages and pro- grams. In addition, list-processing systems engulf a great deal of the machine's memory and sometimes are independent of the standard com- piler languages: if the program contains any list-processing whatsoever, the machine becomes dedicated to the list processer.

The obvious course of action was to design bigger and faster ma- chines. Currently excess running time and excess memory space, once called "waste," are called "overhead" and accepted as necessary evil. Over- head has become relatively negligible since machines have decreased appreciably in cost per machine-language instruction execution. Now that programs are cheaper to run, the problem is to reduce the cost of writing programs: PL/I in particular has been designed to be versatile and com- pact, decreasing the chance of human error and reducing programming time.

Compiler languages are general purpose: for example, Fortran can be applied to almost all problems arising in science or engineering, and Cobol to almost all problems in business. Even higher-level languages than compiler languages are being developed, aimed toward problems in a particular discipline such as civil engineering or musical analysis. The value of these problem-oriented languages is the time they save, both in training new programmers and in developing working programs. Pro- grams written in problem-orientM languages usually are realized in lower-level languages---that is, an instruction in the higher-level language is interpreted and executed by a program written in a compiler or assembly language. Thus a hierarchy of languages is established: a pro- gram written in the .highest level terms is translated and retranslated down to machine language. PL/I seems ideally suited for writing inter- preters for problem-oriented languages.

Before the introduction of PL/I, the adventuresome programmer wishing to fully utilize all the computational capabilities of a large machine, was forced either to learn many languages or to attempt awk- wardly to express in one language operations truly beyond the expressive capabilities of the language. PL/I is a language which incorporates undetr fairly powerful operational control both an arithmetic facility and a data manipulative facility. Moreover, the PL/I language appears to parallel machine language in the sense that using PL/I one can request virtually all operations that current.day machines can perform. One does not have to perform tricks to circumvent the compiler.

PL/I allows the programmer to work with data taking a variety of forms of numbers or strings of symbols. The control language is compact and allows him to specify the conditions under which an operation is to be performed in about as many words as he would use were he to be programming in English. In addition to describing his procedure, the programmer needs a language for communicating the results to and from the computer: PL/I permits both free-form as well as rigid, fixed-field format. In addition, the language provides for describing asynchronous

20

Page 3: PL/I: A programming language for humanities research

operations, e.g., those that may go on independently of each other, or that must wait until others are done: present-day computers have such facility. Finally, the language is geared to a system in which the programmer com- municates via typewriter to the computer, a direct line into the machine: both the program structure and data can be designed free of any fixed page or line positions. Time sharing, direct communications with com- puters via typewriter, also is a present-day fact.

Let us examine in more detail what these statements mean to the humanities programmer and what direction they can give to his efforts.

Humanities programmers are concerned primarily with symbolic data, as opposed to numbers. They wish to manipulate sentences and texts, or strings of their own notations representing a symphony, the structure of a painting, the movements in a drama, or a collection of artifacts they have uncovered. We observe that PL/I allows data to take the form of strings of symbols. Thus in a PL/I program one can generate any desired text or coding schemes, and move and save strings for accumulating diction- aries and preparing output text. Moreover, PL/I provides the essential minimum of operations on strings of symbols, much as addition, sub- traction, multiplication, and division are the essential minimum of ,o~er. ations on numbers. The concatenation operation (denoted "CAT" or "ll") joins two strings to create a new string, much as "q-" adds two numbers to form a third number.

Thus 'UP' CAT 'AND' CAT 'DOWN'

o r

becomes 'UP' [I 'AND' I I 'DOWN'

'UPANDDOWN'.

Another basic operation, SUBSTR, specifies a SUBSTRing of another string: for example,

SUBSTR ('IMMODERATION', 3, 4) is the string 'MODE'. That is to say, one may specify a portion if he knows in advance the starting position (in this case, the 3rd character) and the length (in this case, 4 characters) of the substring he desires. Conversely, suppose one wishes to search a string for a known substring: he uses INDEX. In particular, the value of

INDEX ('IMMODERATION', 'MODE') is 3. Were the substring not found in the string, INDEX would return the value O. Likewise, the length of a known string (such as 'MODE') is stated LENGTH ('MODE'), here with value 4.

Using the essential operations, CAT, SUBSTR, INDEX, and LENGTH, even the most complex manipulations can be described. As a simple example, we will examine PL/I instructions that take the sen- tence 'THE SKY IS BLUE.' into the question 'IS THE SKY BLUE?' In English one would say, "Find the verb, the word 'IS', in the sentence; the

21

Page 4: PL/I: A programming language for humanities research

predicate is the string starting with the word directly after the verb to the end of the sentence, i.e., just before the '.'; the subject is the substring up to but not including the verb; finally, the question is the verb followed by the subject followed by the predicate followed by a question mark." In PL/I one would say:

VERB - - 'IS'; ISPOSITION m_ INDEX (SENTENCE, VERB) ; PREDICATE - - SUBSTR (SENTENCE, ISPOSITION q-

LENGTH (VERB), INDEX (SENTENCE, 7) - -1 ) ; SUBJECt = SUBSTR (SEI, rrENCE, 1, ISPOSlTION --1) ; QUESTION = VERB [1 SUBJECT I] PREDICATE I! '?';

Similarly, if the SENTENCE were 'THE BROWN FOX IS VERY QUICK.'

QUESTION would become 'IS THE BROWN FOX VERY QUICK?'.

Note that in either case the machine has no idea of what it is saying!

The notions of INDEX can be used to examine whether two strings are identically equal. In general, strings such as 'BLACK' and 'BLACK !I !' should be considered equal in a lexicographicat sense: hence PL/I provides for the alphabetic comparison of strings. In fact, the operations that compare numbers also compare strings; in PL/I one specifies the con- dition that 'BLACK' is less than 'RED' in the same way that he would specify that number 2.71828 is less than 3.14159.

Humanities programmers need the ability to split, scan, measure, compare, and join strings in practically all of their computer efforts: the simple problem of right-justifying a line, filling in blanks between words so that the total line length fills the width of the column, takes all four manipulative operations. Editing a text requires many search and split operations; alphabetization is essential in making an index or concor- dance. Finally, complex pattern-matching problems that arise in musical or graphic art analysis definitely require the facility to recognize substrings and to move elements of data.

In addition to using single strings, one can work with lists, i.e., a sequence of strings, or more complex tables of strings, or even tiered out- Iines of strings. Humanities programmers use lists of words in many of their computations, particularly lists of different words and lists with the entries alphabetized or ordered in some other way. Problems calling for outlines of strings arise in problems of information retrieval. Information retrieval deals with a quantity of information, such as contents of all music catalogs from publishers in the last two centuries, or the court records of musicians and the instruments they played, or the entire text of Milton's Paradise Lost. One wishes to analyze the contents of the in- formation, to find whether a given piece might have been performed in a given place at a given time, or whether Milton uses certain words uni- formly in certain contexts. In the former case, catalog information is or-

22

Page 5: PL/I: A programming language for humanities research

ganized so that the composers' names are prominent, with piece names and instrumentation at successively lower levels. In the latter case, the level structure parallels that used in diagramming a sentence, in which the subject and predicate are uppermost, the nouns and verbs on the next level, and modifiers on still lower levels.

The second example, the problem of expressing a sentence structure, indicates an even more general data structure which programmers need: namely, that of the list or tree structure mentioned earlier. PL / I has the potential to do list-processing, although the concept of general list struc- tures has not been built into the language. What have been provided are simple operations (on the order of SUBSTR and I N D E X ) which will allow a programmer to set up list structures and execute operations upon them. The basic P L / I operations allow one to control the location of in- formation in the computer's memory, and to control the links between various items of information. Thus, just as P L / I contains elementary arithmetic operations from which one can build the most complex arith- metic calculations, so it also contains the simplest string- and information- handling operations, from which one can construct programs to do the most complex manipulations and retrievals.

Not only are the elementary operations of string manipulation pro- vided, but they can be connected easily and precisely by several types of control statements. This fact is particularly important to humanities pro- grammers since programs to do analysis require a large number of opera- tions executed only if some condition is satisfied--consider a parsing program in linguistics, or a harmonic analysis program in music theory. Ia PL / I one can write:

IF condition T H E N operation;

when the operation is performed if and only if the condition is true. The condition may be a question of whether two strings are lexicographically ordered in some way, that a string contains a particular character or sub- string, or a complex logical combination of similar statements. For exam- ple, if the operation is an analysis of an interrogative sentence, the con- dition must determine whether a particular sentence indeed is in interrog- ative form; the operation possibly might not work on a declarative or imperative sentence. To be precise, we might wish to determine whether the sentence begins with 'IS' or *ARE' and ends with a question mark. In this case, the command is stated:

IF (SUBSTR (SENTENCE, 1, 3 ) = 'ARE' SUBSTR (SENTENCE, 1, 2) = IS &

SUBSTR (SENTENCE, L E N G T H (SENTENCE) , 1) = '?' T H E N operation:

The vertical bar *!' stands for "or" and the ampersand "&" means "and." Note that the last character of string SENTENCE has position number equal to L E N G T H (SENTENCE) , which symbolism we use since PL / I reads strings only from left to right.

23

Page 6: PL/I: A programming language for humanities research

In PL / I one can execute one of several alternatives by using the word ELSE: for example:

IF first condition T H E N first operation; ELSE IF second condition T H E N second operation;

ELSE third operation; Here the third operation is executed only if the first and second conditions are not true. For completeness, we should remark that even the operations may contain further I F . . . T H E N statements: hence extremely complex operations can be built up, with many contingencies, in a compact manner very easy to write and easy to understand once the program has been written.

Another form of controlled operation that programmers employ is a "loop," that is, a sequence of operations that is executed over and over again until a particular condition is satisfied. For example, to find an entry in a dictionary one scans page by page (here the loop is the scan down the page) until one finds the word he is seeking. In PL/I one can specify that any sequence of operations is to be performed together as a loop, with additional remarks stating how many times the loop is to be executed, or some condition which must be satisfied in order for the loop to be executed. In the latter case, one assumes that the operations in the loop can modify the condition so that during some execution of the loop the condition becomes false and the loop will terminate. Humanities pro- grammers find useful both of the loop techniques, such as in processing a list of items in a similar manner, one by one; or as in the previous exam- ple, one which is repeating a loop of operations on a sequence of items until an item is found that satisfies a pre-ordained condition.

Finally we observe that PL/I is a language to be used for designing programs that run on current-day computers, particularly those that have typewriter input. The language itself is punched in a sequence of sen- tences ending in semicolons--as many per line as the programmer desires to type, or he may use several lines for one sentence. Further, one standard input /output technique is stream-oriented, i.e., the symbols are assumed to be held in a long string resembling ticker-tape. On input one chops off as many symbols from the stream as he needs for a particular compu- tation; output symbols are dispatched in one continuous stream. Just as one tears rape output from a telegraph into lines to be glued on a page of paper, one form of stream output can be split into lines, using carriage return and new page symbols. PL/I treats stream files virtually identically to strings: it is possible to write a program that works on data stored wholly in core memory, then with few modifications to apply the program to data stored mainly in disks and auxiliary memory. Such a facility greatly reduces the time taken to develop programs operating on large amounts of data. Furthermore, there is no limitation on files or file names - - t h e same program can operate on all the files the programmez will create.

A second form of input /output is oriented toward definite records stored on magnetic tape or disk flies. Records can be split apart into com- ponent information, and either rewritten into the old record file, or

24

Page 7: PL/I: A programming language for humanities research

written into a new file. Both input /output forms are useful to humanities work: the stream concept, with its implied free format (i.e., no definite structure or sequence of items) is ideal for communicating texts to the computer. Also, stream input is useful for coding musical material and information about paintings, sculptures, and other artifacts that have no definite structure. On the other hand, items in a bibliography are suffi- ciently similar to be sorted in records of more or less identical format and size. Often tables and other collections of information that have been generated during a computation are in a standard form and can be stored as fixed-length records. Fixed records can be buffered into the input-output device--that is, reading one record and writing another can overlap the execution of operations upon a third stored in computer memory.

P L / I incorporates instructions for handling data that is stored on random-access devices and that is retrieved in a random manner; this latest development in computer hardware is particularly useful to human- ities programmers who a.,e retrieving information from a large number of individual records that have no particular logical order. For example, one seldom looks only at consecutive cards in a card catalog. Most often, one jumps from item to item, which can be related by author o.r subject or place or time of occurrence. Moreover, one seldom follows through all of the see-also references, owing to the tedium and time consumed in re- opening file drawers. In a random-access computer file, all such references can be scanned. PL / I provides instructions governing the logical flow of random access.

The fact that certain sequences of operations can go on in paral lel-- such as a reading operation in parallel with a writing operation in parallel with computat ion--means that present-day computers have multiprogram- ruing capability, i.e., can execute several programs simultaneously. In PL/ I one can specify that two sequences of operations are independent tasks, and that the commencement of a third task awaits the completion of the first two. Also, one task--say the processing task---can query whether another task---such as the read/write task---has been completed, so that, for example, the processing task can decide whether it has time to do another loop before the input /output device is finished and must be given further information to communicate. Information retrieval programs and programs that must process large amounts of data both rely on multi- programming to achieve speed. If the computer can find one item of in- formation while it processes a second, the running time of a program can be cut down by a factor of two or three. Moreover, since computers are approaching physical limits on their processing speed, future computers must be capable of many parallel operations in order to achieve even faster computation rates.

PL / I is an attempt at a unified synthesis of many current program- ming languages. Most of the features are present in one or more already existing languages, but the designers of P L / I are the first to attempt to create a language that has all the features in one. Hence, one can incor- porate as much list processing, string manipulation, and arithmetic cal- culation as he desires into one program. In particular, no problems in data

25

Page 8: PL/I: A programming language for humanities research

format, subroutine linkages, etc., arise in trying to fit together several sub- routines from different languages. Also, one can do proportionately little manipulation or list processing without dedicating great amounts of com- puter memory to a large list-processing system.

We observe that the language does not include comple x operations, except a SORT instruction which procedure is the most often-invoked non- trivial operation in humanities computing. The user is required to build up a variety of subprocedures that he will use frequently in his own work. Thus he is creating his own problem-oriented language. In principle, this operation-building is similar to the process of building a woodworking shop. A cabinetmaker buys tools from his suppliers, such as saws and clamps. Then, using these basic tools, he constructs the jigs and mitre boxes, etc., that he will need in building a particular piece of furniture. The tool suppliers do not attempt to market a complete line of jigs, be- cause jigs differ from job to job: they prefer to sell a line of jig-making tools. On the other hand, as the cabinetmaker matures, his shop will de- velop standard jigs, etc., geared to his designs.

Likewise, a number of co-workers in a computing group will begin to standardize their techniques so that they can build up a "package" of subroutines and subprocedures. The scientists have developed packages for statistics computations; now routine statistics problems can be reduced to a sequence of links between the packaged programs. Rather than spend weeks of duplicating other programmers' efforts, the statistical programmer can assemble a program in a matter of hours.

Eventually humanities programmers will develop similar packages. At New York University's Institute for Computer Research in the Human- ities we are working on a package for basic string manipulation, extend- ing the power of PL/I as we did earlier with Fortran. Similarly, we are developing a system for p~ttern analysis of time-dependent data (such as music and drama). Because of its extended manipulative power, pack- ages written in P L / I will replace and supplement the original efforts written in Fortran.

Joyously enough, the PL/I language is by no means as complicated to learn as it is complex and powerful. We have had the experience of teach- ing science students and humanixies scholars alike to write useful pro- grams after a few hours of lectures. Simple operations such as stream input/output, creating simple lists, and the string manipulations of SUBSTR and INDEX can be used without even knowledge of the more abstruse forms of arithmetic and input/output. Furthermore, any state- ment in the language that is grammatically correct will probably lead to a program that actually runs on the computer, albeit with errors. Because one can begin to interpret results, he can come quickly to the phase of programming called 'clebugging"--analyzing the erroneous rosults in the output to locate the programming errors. Also, PL/I incorporates certain language features that would never be used in the running of a program, but only in the debugging: checking for arixhmetic errors as well as taking snapshots of various quantities during the execution of the program to see whether the computations are going as planned.

26

Page 9: PL/I: A programming language for humanities research

The versatility of the language and the power to create a package of operations means that the scholar can concentrate on developing his own notations He can experiment with a variety of notions and concepts, testing each of them using data he has gathered and described with a notation meaningful to him. The freedom to experiment arises from the ease of developing a working program and of incorporating modifications: since more of the programming effort has been supplied by the designers of PL/I and in the ensuing work of creating packages, less time must be wasted in debugging and running a program.

Thus, PL/I is a powerful, versatile programming language designed to handle contemporary problems on contemporary machines. PL/I is not guaranteed to solve all problems, but it certainly will facilitate the scholars' search. PL/I Language Specificatio,s, IBM System/360 Operating System, File $360-29 Form C28-6571-3 (the basic manual). An lntroductio,a to PL/I, IBM Form C20-1632-0 (elementary; no string manipulation). A Guide to PL/I for Fortra~z U.<.r~, IBM C20-1637-0 (Good introduction to the PL/I system). A PL/I Primer, IBM C28-6808-0 (more advanced than the Guide, but worthless information on string manipulation). PL/I Bulleti~, ed. R. N. Southworth, c/o Logicon, Inc., 205 Avenue I, Redondo Beach, California 90277.

Implementatio~Js. We have heard reports that several manufacturers may market PL, I compilers. IBM has a working F-level compiler for a useful subset of PL/I, with both larger and smaller compilers announced for release in 1967. Also, SDS has advertised a PL/I compiler available in 1967. CDC has a PL/I implementation study group but plans no specific compiler products. (PL/I Bulleti,, August 1966.) (Notes, comments, open forum on all aspects of PL/I: write the editor for subscription information.)

Art, Art History, and the Computer By Kenneth C. Lindsay Even technologists must be a little astonished by the phenomenal growth of the computer during the past two decades. From a complex mechanical problem which concerned only a few industries and engineers, it has spread like an alarming rumor, has become a part of folk humor and direful social prognostications, and now entrances both the yearnings of lonely hearts and the minds of learned societies. Wealth-producing pro- fessions such as business, medicine, and law--which depend on rapid access to well-organized bodies of information--have been quick to adapt the computer to their needs. Disciplines like mathematics and physics played an important role in its development because of their military and scientific importance and because the metric patterns of their thought Ke,*2eth C. Lindsay is Chairman o/ the Department of Art at State University of Neu' York, Bi~lgbamton.

27