7
DY Third-Generation versus Fourthaeneration Software Development Santosh Id Mi- and Paul 1. Jalics, Cleveland State University Fourtbgenemtion IangUWes are not always better than their predecessom This cast? study reveals where they do well and where they come up short. 8 urth-generation-language tools have made impressive gains in p r e F ductivity, as several reports in recent years have shown.’,’ These productivity gains have been made possible by features like ease of use, use of nonprocedural code, direct interfaces to database-man- agement systems, and development times for applications that are a tenth that of thirdgeneration languages such as Cobol. While many of the fourthgeneration sys- tems support nonprocedural code and database-system interfaces, our study showed that claims regarding application- development time are overstated. Fourthgeneration languages also claim to help people without computer science training develop their own applications with minimum effort so they need not de- pend on their organization’s information- systems department. The differences be- tween third-generation and fourth- generation languages are briefly ex- plained in the box on page 9. 0740-7459/8810700/Oa)8~1.00 01988 IEEE To test a few of these tools’ features, we selected a case-study problem that is small enough to create several solutionsfor and yetsubstantial enough togain insightsinto each tool. We selected two fourthgenera- tion tools, dBase I11 and PC/Focus, to de- velop solutions with. We also developed a solution in Cobol to use as a benchmark for third-generation-language perfor- mance. One ground rule for selecting the fourthgeneration tools was that the tools had to run on MSDOS PCs because no such tools were available on the univer- sity’s mainframe and because the cost of buying one for the mainframe was too high. The cost of PGbased programs, on the other hand, was affordable. We chose dBase I11 because it not only satisfied the PC requirement but also be- cause it is typical of a whole group of what we classify as low-level fourthgeneration tools (others in the group include Rbase, Oracle, and Informix), which include a IEEE Software

Third-generation versus fourth-generation software development

  • Upload
    pj

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Third-generation versus fourth-generation software development

D Y

Third-Generation versus

Four thaeneration Software Development

Santosh Id Mi- and Paul 1. Jalics, Cleveland State University

Fourtbgenemtion IangUWes are not

always better than their predecessom

This cast? study reveals where they do

well and where they come up short.

8

urth-generation-language tools have made impressive gains in p r e F ductivity, as several reports in recent

years have shown.’,’ These productivity gains have been made possible by features like ease of use, use of nonprocedural code, direct interfaces to database-man- agement systems, and development times for applications that are a tenth that of thirdgeneration languages such as Cobol. While many of the fourthgeneration sys- tems support nonprocedural code and database-system interfaces, our study showed that claims regarding application- development time are overstated.

Fourthgeneration languages also claim to help people without computer science training develop their own applications with minimum effort so they need not de- pend on their organization’s information- systems department. The differences be- tween third-generation and fourth- generation languages are briefly ex- plained in the box on page 9.

0740-7459/8810700/Oa)8~1.00 01988 IEEE

To test a few of these tools’ features, we selected a case-study problem that is small enough to create several solutionsfor and yetsubstantial enough togain insightsinto each tool. We selected two fourthgenera- tion tools, dBase I11 and PC/Focus, to de- velop solutions with. We also developed a solution in Cobol to use as a benchmark for third-generation-language perfor- mance.

One ground rule for selecting the fourthgeneration tools was that the tools had to run on MSDOS PCs because no such tools were available on the univer- sity’s mainframe and because the cost of buying one for the mainframe was too high. The cost of PGbased programs, on the other hand, was affordable.

We chose dBase I11 because it not only satisfied the PC requirement but also be- cause it is typical of a whole group of what we classify as low-level fourthgeneration tools (others in the group include Rbase, Oracle, and Informix), which include a

IEEE Software

Page 2: Third-generation versus fourth-generation software development

procedural language and some simple nonprocedural features like a report gen- erator.

b'e also sought a higher level fourth-gen- eration tool to test. We had difficulty find- ing agreement on what tool might repre- sent this category best, but we finally selected PC/Foc~is based on the recom- mendation put forth hy Robert Tufts, an ACM national speaker, during aJanuary 1987 address to the Cleveland-area ACM chapter. Martin and Leben' also classified Focus (PC/Focus is its PC version) as a fourthgeneration tool.

For each solution, we examined devel- opment effort, code s i x , and perfor- rnancc characteristics. Even though the code sixs were smaller with both fourth- generation tools, the third-generation Cobol was clearly superior in perfor- mance. It took longer to develop the solu- tion in Cobol than in dBase Ill hut less time than in PC/Focus. Beinga fourthgenerat- ion language per se does not mean faster development.

The problem The problem we chose dealt with a

group of people, each of whom may have one to four children. There is a record for each individual that includesa name, a So- cial Security number, a birth date, date of death (may be zeros if still alive), and the Social Security number of each child (if any). The problem is to uTite a report that lists each person in the file and his child- ren with their names and ages, and the same for each of the grandchildren. Figure 1 shows the report format.

We chose this problem for several rea- sons:

Itismodestenough tobeimplemented three ways and does not call for any com- plicated calculation or formatting require- ments.

It involves substantial file access, both sequential and random.

It can be solved by someone with a

limitedcomputer-science training, such as disk to run all the programs. We tracked a sophomore computer-science under- development time for each tool and notcd graduate student. This makes the problem any special problrms. comparable to a development en\iron- We executed all pi-ograins scvcral times ment in an organization where people ' with input-file aiics of IO, 100, 200, 500, with a limited programming background 1 ,ooO, 2,000,5,000, and 10,000 records. We would be expected to develop their own created these IZS<:II input fileswith adata applications using fourth-generation generator that produces iw I T I ~ I I ~ records tools. as requested. We collected two prrfor-

inance measures for each cxeciition: the Experimental setup elapsed time to create the databases and

We wrote programsin dBase IIIPIus\'er- the elapsed time to grner-ate the r-eport. sion 1.0, PC/Focus Version 2.0, and Re- Because the Cobol implenlcntation did alia's PC Cobol Version 3.00. We used an not interact with any database, the first IBM PS/2 Model S O (which has an 80286 performance measurcw;ls for the creation processor) with MSDOS 3.3, I Mbyte of of an indexed sequcntial file M i t h the Str memory, and a standard 2@Mbyte hard cia1 Security number as the kcy field.

There is nostandarddefinitionforeitherthird-orfourth-generation languages, but perhaps the simplest definition is that third-generation languages are essentially procedural while fourth-generation ones are essentially nonprocedural.

According to Martin,',' third-generation languages are high-level languages. These are standardized and are largely independent of hardware. One important requirement in pro- gram development using any third-generation language is the specification of programming tasks in a step-by-step algorithmic fashion. Programming languages such as Cobol, Basic, C, and Pascal are commonly considered to be in the third generation.

Third-generation languages make programmers rely on procedures composed of tokens. These tokens (commands, types, and functions) work like human languages, and are limited only by the available tokens, grammar, and programmer's cleverness.

Martin defines fourth-generation languages as nonprocedural, end-user-oriented lan- guages. However, others associate various diverse features with these languages, including integrated database systems, ease of use, and screen- and report-generation capabilities. Examples of fourth-generation systems include dBase, Focus, Oracle, Mantis, and Ramis II.

Fourth-generation languages are nonprocedural, relying on predefined procedures that mayor may not be well-suited to a particular application. These nonprocedural facilities often workwell for an end user (a nonprogrammer) on a narrow set of problems. There are usually procedural tools to help theuser go beyond the nonproceduralfacilities'capabilities, but these tools are usually not as rich or expressive as third-generation tokens.

References 1. J. Martin, fourth-Generation Languages, Vol. 1 , Prentice-Hall. Englewocd Cliffs, N.J.. 1985, pp.

2. J. Martin and J. Leben, Fourth-Generafion Languages, Vol. 2, Prentice-Hall, Englewood Cltffs. 1-20.

N.J., 1986, pp. 139-184.

July 1988

Page 3: Third-generation versus fourth-generation software development

Table 1. Code size in lines.

Cobol dBase 111 Plus PC/Focus

Declarations 200 0 27

Executable statements 182 114 130

Total lines 3 82 114 157

Table 2. Elapsed time in seconds for f i e or database creation and report generation.

Number of records

~~

Cobol dBase 111 Plus PC/Focus File Report Database Report Database Report

1 1 5 26 37 47

2 3 7 293 63 108

2 8 11 592 117 180

4 23 20 1,535 42 1 500

7 42 50 3,387 2,087 1,110

15 111 104 6,980 9,268 2,100

39 445 393 18,584 50,430 34,229

108 719 disk full disk full diskfull disk full

ReYUb T h e results of ou r experiments

countered the common assumption that fourth-generation languages are better than their predecessors. While we realized some savings in code sizes and develop ment time, these savings were not substan- tial. The execution performance of fourthgeneration implementations were clearly much worse than that of Cobol.

Code size. Table 1 summarizes the code sizes for the three implementations. The total number of lines for both fourth-gen- eration tools is significantly less than that for Cobol. But when you consider the ex-

ecutable statements only, you may be sur- prised that PC/Focus is only 29 percent smaller than Cobol and that dBase I11 is 39 percent smaller than Cobol.

As Table 1 shows, each fourth-genera- tion tool requiresfewer (or no) datadecla- rations than the thirdgeneration Cobol, which is generally true offourthgenerat- ion languages. This is both an advantage and a disadvantage: Data declarations in thirdgeneration languages are important to define data types and sizes precisely. Dropping this precise specification tends to result in sloppier programming.

For example, dBase 111 memory vari- ables need not be declared and can

DATE 10/21/87

PAGE 1 LEGEND FOR AGE: 0 = AT DEATH 1 = CURRENT AGE

- GRAND CHILDREN - CHlLDREN . SSHUM NAME AGE SS HUM NAME AGE SS NUM NAME AGE

... . ...... ............. ... . ...... .................. . . . . . . . . . .

197441477 RANDOLPH HENRY J 31 1

198014071 RANDOLPH HCUER J 26 T 302110109 RAHDOLPH FRANK JR 4 1 * NOT I H FILE *

198278097 RAHDOLPH MYROY A 3 7 1

302119101 RANDOLPH HYRCU JR 11 1 * NOT I N F I L E *

302741918 RANDOLPH JOHN E 15 1 * NOT i n FILE - 198283017 RANDOLPH ARTHUR A 6 0 201119132 RAHDOLPH BRUCE E 2 8 1

301311190 RANDOLPH STEVE A 6 1 * NO1 I N F I L E *

Figure 1. Sample output report.

10

change data t)rpesdynaIriicallyduringexe- cution - which can lead to nasty errors. For example, during two iterations of a loop in a different project, the contents of a memory variable changed from integer values to character stringvahes, leading to debugging difficulties. Furthermore, dBase 111 memoryvariablesdo not have an explicit numeric precision, which can lead to sloppiness and the use of many explicit function calls to, for example, the Round statement. Itcan also lead to crude actions like Set Decimals, which sets the number of digits for all meniory variables, rather than specific ones. After looking at all these considerations, are fewer declara- tions really a good thing? Probably not, even though they do save time.

Development time. The development effort in hours for each implementation WaS

10 hours for Cobol, 8.5 hours for dBase 111 , and 19 hours for PC/Focus.

We spent an additional two hours with the Realia Cobol compiler and an addi- tional 12 hours with the PC/Focus system to familiarize ourselves with these lan- guages.

The development times are not an en- tirely fair comparison because the level of our knowledge of PC/Focus was much lower than for the other two (we had expe- rience with Cobol, although not with the Realia implementation). The 19 hours of development time for PC/Focus thus re- flectsa learning curve, aswell as severe dif- ficulties in applying the tool to the prob- lem (these are described in the box on pp.

But the cornparison of Cobol to dBase 111 is fairer. It shows that dBase I11 took 15 percent less time to develop code that was more compact (70 percent smaller overall and 37 percent smaller for executable code).

One loose definition of a fourthgener- ation tool isthatitcansolveaproblemwith one tenth the effort for any given problem when compared to earlier tools. We did not observe such adramatic improvement in either implementation. However, such gains may well be possible for simpler problems or for ones where the nonpro- cedural features are directly applicable.

12-13).

IEEE Software

Page 4: Third-generation versus fourth-generation software development

Performance. Table 2 lists the two per- formance measures for each imple- mentation. The big difference here is in performance: The dBase 111 solution is 1.5 to 70 times slower than Cobol’s. The PC/Focus solution compares favorably with dBase 111’s until the input-file size reaches 2,000 records -with one excep tion that is probably related to high start- up overhead for PC/Focus. However, the PC/Focussolution is29 to 174 timesslower than Cobol in the entire range of input re- cords tested. It is also more than four times as slow as dRase I11 at 5,OOO input records.

Just as disturbing, the relative perfor- mance ofboth dBase 111 andPC/Fociisget dramatically worse as input-file size in- creaes. Does this matter to users? Proh ably not a great deal if the number of re- cords is less than a few hundred, but when a report takes several hours to run, perfor- mance begins to be an issue.

Figure 2 shows a graph ofrelative perfor- mance during report generation. Clearly, the third-generation Cobol is more than I O times faster than the two fourthgener- ation implementations. Report genera- tion does not include the creation of the database itself, so, for PC/Focus, the load- ing of the data into the databases and the join of these databases is not included in the timing. The fourthgeneration tools come with a large performance penalty - a penalty that gets worse as the files grow larger.

How important is this to users? For databases of a few hundred records, it is not a big problem. But if the execution time exceeds an hour or so, performance is a problem, even if you just consider the wear and tear on the hard disk from the continuous I/O. Long execution times also hurt the turnaround time for report generation.

Program specifics We encountered problems during each

implementation, problems caused by each language’s abilities and approaches.

Cobol. The type of problem we used is typical of business applications, and solv- ing it in Cobol presented no special proh lems. Choosing Cobol as a representative

35

h

30 m 3 0

25 c 0 m ._ c

E 20 a (3,

r 0 15

z c v1 U

J v1 U al

m W

10

8 5 -

0 2 4 Number of input records

(thousands)

Figure 2. Relative performance during report generation.

third-generation language is also apprw priate because most business applications are implemented in Cobol.

The first part of the program reads the sequential ASCII input file, sorts it, and then creates an indexed sequential file with the Social Security number as the key. The second phase of the program then accesses this sorted, indexed sequential file scquentially. If a person has any child- ren, each child’s Social Security number is extracted and then the same indexed sequential file is accessed randomly to find out each child’s name and age. Similar random access is done for each grand- child.

Figure 3 shows the record arrangement of the indexedsequential file. It also shows an access chain traversingfrom a person’s

Social Security number up to that of his grandchild.

The biggest advaritage of this impk- mentation is that the (:ohol conipiler generatedseparate record pointers fir the sequential and the random accesses. So, the implementer does n o t have t o track the sequential position in the file when a random search ofadifferent record in the same file is made. The nerd foi-such track- ing caused us problems in the P<:/Focus iniplementation.

dBase ID. Fourth-genrratiori tools often look for nonprocedural solutions: The ini- plementer specifies the task and thr desired result; the knowledge of’ how t o

achieve the task is built into the fourth- generation tool itself. The only nonprtr

I - Child #1 SS no I t- Child #2 SS no. , Person #I S S no Person #1 SS no. Child #2 SS no. Child #3 SS no. Child #4 SS no.

I Person #2 SS no.

Person #4 SS no. ‘ I

Figure 3. Record arrangement for an indexed sequential Cobol file.

11

Page 5: Third-generation versus fourth-generation software development

PC/Focus: Databas4modeling woes We tried several database-modeling approaches to solve the prob-

lem and achieved the final solution only in a roundabout way. We believe that our experience in developing this solution highlights acorn- mon problem with fourth-generation tools: If the set of predefined so- lutions of afourth-generation tool does not fit your application, it is often very difficult to solve even simple problems with such tools.

Two-segment Focus database. Our initial strategy for solving the case-study problem was similar to the one used for both the Cobol and dBase 1 1 1 solutions - except for the database arrangement. The database was to be arranged in a hierarchical, two-segment Focus file sequenced on the ascending values of the Social Security numbers, as Figure A shows.

The only difference between this arrangement and that shown in Figure 3 in the main text is the placement of children Social Security numbers. Eachchild Social Security numberwastobestored as an in- stance of the second segment of the Focus file. The first segment of the focus file was to be defined with several fields to contain a person's genealogical information. The only field in the second segment was to bethechild'sSocialSecuritynumber. The Focusfilewastobesequen- tially accessed based on the Social Security numbers from the first (root) segment. Information for the children and grandchildren was to be dynamically retrieved from root segment instances with corre- sponding Social Security number values. Figure A shows such an access chain traversing from aperson's Social Security number up to that of a grandchild.

We abandoned a two-segment Focus-schema strategy, which seemed the most natural strategy, because PC/Focus's nonpro- cedural report generator cannot print output lines containing informa- tiondrawnfromdifferent root segment instancesof thesamefile. Such a facility would call for a dynamic seek to the child's root segment and return to the parent's root segment or would call for a dynamic seek to agrandchild's root segment and then return to the child's root segment. We considered the dynamic link to be essential because each person inthedatabase had to be reportedonasaparent, asachildif hisparent was in the database, and again as a grandchild if his grandfather was also in the database.

Threesegment Focus database. One basic rule of information storage is never to duplicate a piece of information in a database be-

cause you may accidentally update some copies but not all, resulting in unreliable information. Duplicate storage also results in redundant disk-space use. However, in the absence of a dynamic seek facility in PC/Focus, we also considered a database structure with duplicate in- formation.

This second strategy was to create a three-segment hierarchical structure where the root segment is the parent, next is the child, and the last is the grandchild. Each root-segment instance may be associ- ated to a maximum of four child instances in the second segment. Sim- ilarly, each instance in the second segment may be associated with a maximum of four instances (grandchikiren) in the third (and last) seg- ment.

Thus, at the minimum, a root-segment instance may be associated with zero second- and third-segment instances if a person never had anychildren. At the maximum, a root-segment instance may be asso- ciated with 16child instances through the bottom two levels of the hier- archy. Duplicated data storage occurs if a person is also a child or grandchild. Figure 6 shows this structure. If PC/Focus had been capable of creating such a database, its report generator could have easily produced the report in the format we wanted. But PC/Focus could not easily produce this database; we had to use a cumbersome and time-consuming method to get the desired report.

PC/Focus provides a Match/Nomatch statement to load databases from an inputscreenorfrom otherfiles.This Match/Nomatch statement acckpts the key value of the input record, compares the key value to the set of the key values in the segment, and then inserts or rejects the record depending on the action specified with the MatchINomatch statement.

In a multisegment database, loading any record must proceed through the parent segment: If you want to load a record (child record) into the second segment of a three-segment database, you must know the keyvalueofthe parent record forthatchiki record.TheMatchstate- ment first uses the parent's key value to position the entry to the sec- ond segment. A second Match statement then uses the child's key value to determine the child's location. Similar navigation is required to insert a record into the third segment

This data-loading navigation requirement kept us from creating the three-segment database straightforwardly. We encountered no prob- lem when loading the root (first) segment. However, we could not load the child-segment records (and consequently grandchild records) be-

cedural tool available in dBase 111 that can be considered for the type of problem we studied is the report generator.

The dBase I11 report generator is ade- quate for one to three levelsof subtotaling, but it is not nearly powerful enough to do the combination of sequential and ran- dom accesses required in this problem. There is also no way in this report gener- ator to print areport line with information drawn from different records ofone input file.Thatmeantwe had towriteadhse 111 program that sorts the input file, creates an index for it based on Social Security number, and then goes through a logic similar to Cobol to get the data and print 1inesofthereport.The schematicarrange- ment of the dBase 111 database is same as the Cobol arrangement shown in Figure 3.

The dBase 111 implementation itself did

not cause any difficulty because the proce- dural language available with dBase 111 was adequate for the problem.

PC/Focus. PC/Focus, like IBM's Infor- mation Management System, has a hierar- chical database structure and at first seems to be an appropriate vehicle to represent genealogical information that hierarchi- cally joins generations of people. One of PC/Focus's outstanding features is its table-based retrieval mechanism, which is both nonprocedural and a high-level fourth-generation facility with impressive reportgeneration and report-formatting features.

While we were generally familiar with several database structures (including IMS, Codasyl, and relational) andwith sev- eral PGbased systems (including dBase,

Rbase, and Oracle), we had never used PC/Focus. Learning i t took about 12 hours (including 7.5 hours for initial learning) and included frequent consul- tation with a local expertwho had recently implemented a large health-related sys- tem.

The learning was quite dificult because the reference materials generally describe features only with simple examples; the only complete "reference" was trial and error, and we were often left wondering whether the feature would work in a par- ticular case and whether our under- standing of a feature was enough to use it effectively.

We tried several database-modeling a p proacheswith PC/Focus to see which one would work best. The box above describes them.

12 IEEE Software

Page 6: Third-generation versus fourth-generation software development

cause such processing requires the parents' identity. Unless the parents' key values are first matched, there is no way you can directly go to a key value in the child segment.

Three-segment composite database. Our third approach con- sisted of creating three separate databases, one each for the parent, the child, and the grandchild. The parent and the child databases were both two-segmented structures similar to that shown in Figure A. The grandchild database was a one-segment structure.

The structure description of the child database had a built-in, static cross-reference to the grandchild database, using the grandchild So- cial Security number as the link-key field. Such astaticcross-reference results in permanent links between two records of two different databases over acornrnon key value when data records are loaded.

Each database was loaded with all the records from the input file - the same data was loaded three times to three separate databases. Afterthedataloading, weexecuted ajoin between the parent and child databases, using the child's Social Security number as the key field. Our objective was to create a virtual structure similar to the one shown in Figure B.

The resulting joined databases were accessed via a Table state- ment, which is PCiFocus's nonprocedural reporting facility. Because we still had somecomputation and formatting to doafterthe databases were joined, we stored the result of the Table retrieval in a temporary database. We then accessed this temporary database through another Table statement to generate the properly formatted output.

Butthisapproach failed becausetheTable statementdid not retrieve several data records. This failure occurred if a parent did not have at least one child because the join computation did not include this per- son, since the join-key value was null.

PCiFocus provides two switches, Set All=On and Set All=Pass, that can retrieve an entire short-path hierarchical structure (when a lower portion of the structure has no datavalues). These switches can work whether the database structure is one or many files. However, these switchesdo notworkforshort pathscreated by null join-keyvalues. We tried several other unions and intersections of different Focus files, but none worked.

Augmented three-segment composite database. Our fourth ap- proach is an extension of the composite strategy. We maintained the static cross-link between the child and the grandchild databases over the grandchild Social Security numbers and joined the parent and child

Root segment instance "55m I Secondsegment ,n l lanr?

e n o n s S S n o

Figure A. Two-segment Focus schema.

databases through thechildsocial Security number. As before, wesent thefirstTableretrieval toatemporaryfile. Wethen augmented the tem- porary (and incomplete) result file was by rereading the original ASCII input file (this is the fourth time the file was read) and inserting the miss- ing records. We then could produce a complete report from this aug- mented temporary file.

segment instance

Figure B. Three-segment Focus schema.

Date handling. Both dBase 111 and PC:/Foc~is have special data types for hand- ling dates, but neither was directly usable for ourproblem because computingaper- son's age was the only date-handling ne- cessity-and, byusingabirthdatewith the current date or date of death, this need could be easily managed by storing a four- digit year value instead of acomplete date. Dates in dBase I11 are restricted to the 20th century: Even though a full fourdigit year specification is allowed, the first two digits (the century) is ignored. PC/Focus has many date functions, but none was appli- cable to our problem.

Problems Both dBa5e I11 and PC/Focus are very

impressive fourth-generation tools, but if

you cannot solve your problem readily with nonprocedural features like the re- port generators, you must write programs in some very strange languages - some of which are actually a step backward from third-generation languages.

Language faults. Why are they a step backward? Because the procedural lan- guages provided in many fourthgenerat- ion toolsappear not to have beendesigned with the same care as some of the third- generation languages, like Cobol and Pas Cal.

The procedural components of the fourth-generation systems also do not show the benefits of the 30-plus years of computer-science experience in design- ing languages. Furthermore, these fourth-

generation procedural language compo- nents are often not complete (in terms of. needed facilities), not rigorously defined, and not standardized.

For example, dkase 111 is supposed to be a database system, yet you can normally manipulate only one file at a time, with clumsy switching between files using crude facilities. In PC/Focus, you must work with at least two different system fea- tures, the dialogue manager and the re- port generator, and create temporary files because the necessary facilities are not in- tegrated into a single, consistent, arid powerful environment.

As our subsequent studies with other fourthgeneration tools have shown, the lack of standards and concise definition in many fourth-generation tools is intolera-

July1988 13

Page 7: Third-generation versus fourth-generation software development

ble. The standardization of the third-gen- eration programming languages has brought us the benefit of having concise definitions that define explicitly how any given feature of a language is going to work.

Needed improvements. The general ar- chitecture of fourthgeneration tools also could stand improvement. For example, we had no problem with dBase I11 because we could fall back on its procedural fea- ture, but this option was not available to us for PC/Focus - even though its fourth- generation (nonprocedural) features are superior to that of dBase 111. We believe that every fourthgeneration language should have a procedural component so the user can decide what will work best for each problem. With a procedural facility, you need not depend on the developers of the fourthgeneration tool to anticipate all your needs.

Another improvement would be if the language’s nonprocedural reporting facil- ity did not assume that a report line is al- ways formed with information from ex- actly one record of a database or from information from one record of a joined data structure. Users often need reports where several records of a database may contribute segments of information for each report line. Fourthgeneration tools’ architecture must take thisintoaccount to improve the their general applicability.

A third improvement would be a solu-

tion to the null-key problem. Granted, computing joins where a key value is null remains a generally unsolved problem in database management. Because several fourthgeneration systems use some form of a relational database structure, their nonprocedural reporting features must reckon with the problem.

While we do not have a solution for the null key in joins, we do suggest that report generators be equipped with a userde- fined switch through which short paths created specifically as a result ofjoins can be retrieved only when such short paths start at the end of the record from the first database in the joined structure. For ex- ample, in our case study, the report gener- ation could have taken into account the virtualjoin structure, including all original records where join failed because of null keys.

ur experiments have taught us a lot about the problems of imple- 0 menting software with fourth-gen-

eration tools: The fourthgeneration tools we stud-

ied d o provide tremendous nonpro- cedural facilities not found in thirdgener- ation programming languages. For example, fairlycomplicated reports can be generated through a simple statement in many of the fourthgeneration systems. Other features include multiple sorting and screen generation. These facilities can save applicationdevelopment time.

In terms of executable statements, the fourthgeneration tools take about a third less statements. This is a significant differ- ence, but itnotasdramatic asweexpected.

The most dramatic difference in p r e gram size is the number of data declara- tions: They are more than half the code for Cobol but close to nothing for the fourth- generation tools.

The fourth-generation tools we used come with a performance degradation of 10 to 100timesfortheproblemwestudied.

Lack of precise definition and stan- dardization are major problems with fourth-generation tools; this may make their general use more difficult.

The architecture of fourth-generation tools should incorporate procedural lan- guages (or a bridge to a thirdgeneration language) to help the user develop solu- tions when the nonprocedural features do not meet his needs. However, such proce- dural components must show the same de- gree of precision and completeness avail- able in most third-generation languages.

Given the current architecture of fourth- generation tools, a user can save applica- tion-development time if the problem matches the assumptions in the tool’s pre- defined nonprocedural facilities. If the problem is not the kind the tool was de- signed for, the user may pay development and performance penalties. In these cases, conventional programming is a better al- ternative. .:.

References 1 . D. Leavitt, “More than End-User Tools,

4GIs Add to Productivity,” S o w r e Nm, April 1986, pp. 7@74.

2. G.E. Fischer, A Fundional Model for Fourth- Generalion I ~ n g u a p , Pub. 5M138, Nat’l Bureau of Standards, June 1986, p. 4.

3. J. Martin and J. Leben, Fourth-Genmatia Languages, Vd. 2, Prentice-Hall, Englewood Cliffs, N.J., 1986, pp. 139-184.

Santosh K. Misra is an assistant professor of computer and information science at Cleve- land State University. His research interests in- clude database conversion, CASE, and perfor- mance evaluation.

Misra received a doctorate in business ad- ministration in information systemsfrom Kent State University.

Paul J. Jalics is an associate professor of com- puter and information science at Cleveland State University. His research interests include operating systems, performance evaluation (especially in higher level languages), pro- gram transportability, and fourthgeneration tools.

Jalics received a PhD in computer science Case Western Reserve University.

Addressquestionsabout this article to the auth Cleveland State University, 2121 Euclid Ave., Cle

orsat Computer and Information Science Dept., weland, OH 441 15.

14 IEEE Software