Making scientific computations reproducible

In the mid-1980s, we realized that our labo-ratory’s researchers often had difficulty re-producing their own computations withoutconsiderable agony. We also noticed that

junior students, who typically build on the workof more advanced students, frequently spent agreat deal of time and effort just to reproducetheir colleagues’ computational results.

Reproducing computational research poseschallenges in many environments. Indeed, theproblem occurs wherever people use the tradi-tional methods of scientific publication to de-scribe computational research. For example, in atraditional article, the author simply outlines therelevant computations—the limitations of a pa-per medium prohibit complete documentation,which would ideally include experimental data,parameter values, and the author’s programs.Readers who wish to use and verify the workmust reimplement it, which is often a painfulprocess. Even if readers have access to the au-thor’s source files (a feasible assumption given re-

cent progress in electronic publishing) they canonly recompute the results by invoking the vari-ous programs exactly as the author invoked them;such information is something that is usually un-documented and difficult to reconstruct.

To address these problems, we developed Re-Doc, a system for reproducing scientific compu-tations in electronic documents. Since imple-menting it in the early 1990s, ReDoc has becomeour principal means for organizing and transfer-ring our laboratory’s scientific computational re-search.

ReDocs are best defined operationally: Afteran author completes a ReDoc, readers can de-stroy all existing results—principally the illustra-tions—and rebuild them using the author’s un-derlying programs and raw data. Using ReDoc,authors describe their computations and preservetheir details in fully functional examples. Authorscan also test their archived research software byoccasionally removing and regenerating the doc-ument’s results using ReDoc’s standardized in-terface commands. ReDoc also lets authors de-velop automatic scripts to verify any document’scompleteness and reproducibility before its pub-lication. Scientific journal publishers might alsouse ReDoc in the referee process to test the re-producibility of illustrations.

ReDoc benefits readers in several ways. Justas a driver wants to find the brake pedal in the

NOVEMBER/DECEMBER 2000 61

MAKING SCIENTIFIC COMPUTATIONSREPRODUCIBLE

To verify a research paper’s computational results, readers typically have to recreate themfrom scratch. ReDoc is a simple software filing system for authors that lets readers easilyreproduce computational results using standardized rules and commands.

S C I E N T I F I CC O M P U T A T I O N S

MATTHIAS SCHWAB, MARTIN KARRENBACH, AND JON CLAERBOUT

Stanford University

1521-9615/00/$10.00 © 2000 IEEE

62 COMPUTING IN SCIENCE & ENGINEERING

same location in every car, readers benefit byhaving a few standard commands that let themexplore any document’s scientific contents. Con-sistent, standard commands to remove and re-produce a document’s result files help readers ac-cess and study an unknown document.

ReDoc overview

In essence, ReDoc is a filing system that makesresearch cooperation effortless: Researchers canpublish their results and colleagues can imme-diately build upon them, adding improvementsand innovations. To create a ReDoc, authorscombine project-specific rules with general,standard rules and naming conventions to offerreaders a simple set of commands to remove andrebuild the author’s results. ReDocs becomereservoirs of easily maintained, reusable soft-ware, and thus authors, along with readers, ben-efit from using the system.

In our laboratory, using ReDoc has tremen-dously increased the amount of software that isreadily accessible to researchers. Students cannow take up former students’ projects and eas-ily remove and recompute the original resultfiles. Students who graduate and leave our lab-oratory can seamlessly continue their researchat new locations. Although the effort to gener-ate reproducible research documents is typicallydirected at helping other people use existingwork to further their own, authors also benefit.Because many researchers typically forget detailsof their own work, they are not unlike strangerswhen returning to projects after time away.Thus, efforts to communicate your work tostrangers can actually help you communicatewith yourself over time.

ReDoc’s current reader interface is implementedin the platform-independent GNU make, whichexcels in efficiently maintaining even complex soft-ware packages. Conceptually, the ReDoc reader interface is independent of both document format(TeX, HTML, and so on) and the underlyingcomputational software, whether it be Matlab orMathematica, or C and Fortran programs. Al-though we restrict our focus here to Unix systemsand the make utility, the concept—using a readerinterface to reproduce a document’s computationalresults—should apply to electronic documents inother computer environments as well. The GNU↑ code that implements the ReDoc rules, this arti-cle, and the accompanying example is freely avail-able on our Web site at http://sepwww.stanford.edu/research/redoc.

ReDoc features

In addition to a traditional article and the ap-plication’s source code, a ReDoc features threemain components: makefiles, a small set of uni-versal make rules (less than 100 lines), and nam-ing conventions for files. ReDoc offers a readerfour standard commands:

• burn the figures,• build the figures,• view the figures, and• clean up and remove all intermediate results.

These four commands are centered around threetypes of files:

• Fundamental files are data sets, programs,scripts, parameter files, and makefiles (any filenot generated by some computer process).

• Result files are typically plot files, which cancome in various formats including postscriptor gif files.

• Intermediate files are the files that a computerprocess generates when computing the re-sult files from the corresponding funda-mental files. Examples include object files,executables, and partially processed data.

ReDoc’s naming conventions let a communityformulate universal rules that recognize a file’stype and thus permit the system to handle it ap-propriately. For example, a cleaning rule uses acommunity naming convention to identify andremove all intermediate files, such as those withsuffix .0 (object files) or files with the name stemjunk (temporary files). The clean rule leaves atidy directory without any distracting clutter. Wealso use naming conventions for result files. Webase our rules for displaying, removing, or re-computing a result file on the result file’s name.For example, our result files are always figure filesand can have various formats, such as PostScriptor GIF. Our naming conventions require that au-thors indicate the result file’s format by a suffix,such as .ps or .gif. Consequently, we can supply auniversal format-independent rule for displayingresult files: The rule identifies the result file’s suf-fix, concludes the file’s format, and invokes theappropriate viewing program, such as Ghostviewfor PostScript or Xview for GIF files.

ReDoc rules are easy to implement. As the ex-ample in the box, “Creating a ReDoc” explains,authors who already use makefiles need only ad-here to the ReDoc naming conventions and in-clude the ReDoc rules to make a traditional doc-


ument reproducible. We now describe ReDocmakefiles and rules in more detail.

MakefileA makefile is a standard Unix utility for soft-

ware maintenance and contains the commandsto build a software package. More powerful thana simple command script, a makefile indicatesthat result files (targets) are up-to-date whenthey are younger than their correspondingsource files (dependencies). ( For more informa-tion, see the “clean, up-to-date support” side-bar.) Because maintaining a reproducible re-search project resembles maintaining software,the make utility solves our problem elegantly.

Fortunately, fine tutorial books exist on the makelanguage,1,2 and students can easily begin usingmake without a formal introduction.

Because our laboratory deals with computa-tional problems of various sizes that use a diversecollection of software and hardware tools, notevery reader will be able to easily reproduce allresult files. Consequently, authors typically de-fine their application makefiles as one of threeresult list variables—RESULTSER, RESULTSCR,

and RESULTSNR—with the ending letters indi-cating the degree of reproducibility:

• ER: Easily reproducible result files that canbe regenerated within 10 minutes on a stan-

Creating a ReDocBecause an author’s rules deal with a document’s appli-

cation, we can best illustrate ReDoc creation using anactual example. The electronic version of this article is ac-companied by a subdirectory called Frog. Frog contains acomplete, albeit small, ReDoc that discusses a finite-differ-ence approximation of the 2D surface waves caused by afrog hopping around a rectangular pond. The files paper.latex and paper.ps contain two formats of the short scientificarticle describing the finite-difference approximation. SomeRatfor files implement the 2D wave propagation code (Rat-for is a Fortran preprocessor that provides control flow con-structs similar to C; Ratfor is freely available at our Website). The Fig directory contains the result files: a figure(PostScript and GIF versions) of the pond after some wildfrog hops, and the output (two float numbers) of a dot-product test of the linear finite-difference operator andits adjoint. To organize the document’s files, the Frog ex-ample’s author wrote the makefile shown in Figure A.

The variable RESULTSER contains the list of the docu-ment’s easily reproducible results, frog, and dot.

The next rule contains the commands to build thePostScript and GIF versions of the frog result. Such a ruleis application-specific and cannot be supplied by existingdefault rules. The target names comprise the directoryRESDIR, which contains the result files and file suffixes(.ps and .gif), which indicate the file formats. The ruledepends on an executable frog.x, which it executesduring result computations.

A shared include file, Prg.rules.std, supplies defaultrules for compilation and linking of executables such asfrog.x. Compilation and link rules are compiler-depen-dent. In this example, we include some generic Fortranrules. At our laboratory, compilation rules depend on anenvironment variable that indicates compiler type.The author must define the executable’s dependency

on its subroutine object files (as in the case of frog.x) be-cause it depends on the application-specific file names.

In the case of the dot result file, the author-suppliedrules reflect the reader-interface commands: dot.build cre-ates the result file, dot.view displays it, and dot.burnremoves it.

Finally, the target clean invokes the included defaulttarget jclean, which removes intermediate files based onour laboratory’s naming conventions.

SEPINC = ../rulesinclude ${SEPINC}/Doc.defs.top

RESULTSER = frog dot

col = 0.,0.,0.-1.,1.,1.${RESDIR}/frog.ps ${RESDIR}/frog.gif: frog.x

frog.x > junk.pgmpgmtoppm ${col} junk.pgm > junk.ppmppmtogif junk.ppm > ${RESDIR}/frog.gifpnmtops junk.pgm > ${RESDIR}/frog.ps

objs = copy.o adjnull.o rand01.o wavecat.o \pressure.o velocity.o viscosity.o wavesteps.o

frog.x: ${objs}

dot.build ${RESDIR}/dot.txt : dot.xdot.x dummy > ${RESDIR}/dot.txt

dot.view: ${RESDIR}/dot.txtcat ${RESDIR}/dot.txt

dot.burn: rm ${RESDIR}/dot.txt

dot.x : ${objs}

clean: jclean

include ${SEPINC}/Doc.rules.redinclude ${SEPINC}/Doc.rules.idocinclude ${SEPINC}/Prg.rules.std

Figure A. The makefile that organizes the Frog files.


dard workstation.• NR: Nonreproducible result files that cannot

be recalculated (examples include hand-drawn illustrations or scanned figures).

• CR: Conditionally reproducible result filesthat require proprietary data, licensed soft-ware, or more than 10 minutes for recompu-

tation. The author nevertheless supplies acomplete set of source files; this ensures thatreaders can reproduce the results if they pos-sess the necessary resources. Authors list therequired resources in a warning file (such asmyresult.warning), which accompaniesthe CR result file (myresult.ps). In indus-trial-scale geophysical research, many inter-esting results are conditionally reproducible.

Following this usage, the standard make targetsburn and build, which we describe below, arecomplemented by targets burnER, burnCR,burnNR, and buildER, buildCR, buildNR. Forexample, burnCR burns all conditionally repro-ducible result files. The target burn defaults toburnER to restrict the standard removal of resultfiles to easily reproducible ones. The target builddefaults to buildER to recompute the result filesthat make burn removes.

Make rulesReDoc’s third component is a small, consis-

tently available set of standard make rules. Theserules let readers interact with the documentwithout knowing the underlying application-specific commands or files. In the Frog example(see the box, “Creating a ReDoc”), a reader in-vokes targets, such as build, that are not listedin the author’s application makefile. Standardlaboratory rules supply these targets, which theresearcher includes in the document’s makefile(Doc.defs.top, Doc.rules.red, andDoc.rules.idoc).

Standard rules ensure a consistent reader in-terface and free authors from the need to reim-plement the ReDoc rules in every makefile, let-ting them concentrate exclusively on themakefile’s application-specific aspects. Althoughwe formulated the rules so that individual au-thors can override them, in our experience this israrely necessary or desirable. Furthermore, acentral rules set accumulates the community’swisdom on how to organize a ReDoc.

Our laboratory offers about 100 lines of GNUmake code that constitutes our ReDoc rules. TheReDoc rules facilitate four commands: makeburn, which removes the result files (usually fig-ures); make build, which recomputes them;make view, which displays the figures; and makeclean, which removes all files other than sourceor result files. Although authors and readers donot need to understand the implementation de-tails of the ReDoc rules to use them (most re-searchers at our laboratory have never inspected

burn: burnER

burnER: ${addsuffix .burn, ${RESULTSER}}burnCR: ${addsuffix .burn, ${RESULTSCR}}

%.burn: ${foreach sfx, ${RES_SUFFIXES} , \

if ${EXIST} ${RESDIR}/$*${sfx} ; then\${RM} ${RESDIR}/$*${sfx} ; fi; \

}

Figure 1. make burn finds and removes all easily reproducibleresult files.

view : ${addsuffix .view, ${RESULTSALL}}

%.view: FORCEif ${CANDO_GIF} ; then \

${MAKE} $*.viewgif ; \elif ${CANDO_PS} ; then \

${MAKE} $*.viewps ; \else \echo “can’t make $*.viewps $*viewgif”;\

fi

(a)

%.viewgif : ${RESDIR}/%.gif FORCE${XVIEW} ${UXVIEWFLAGS} ${RESDIR}/$*.gif

%.viewps : ${RESDIR}/%.ps FORCE${GVIEW} ${UGVIEWFLAGS} ${RESDIR}/$*.ps

(b)

Figure 3. The view rule (a) updates and displays results. When the%.view rule finds a workable version, it invokes %.viewgif or%.viewps, which updates and displays the result file (b).

build: buildERbuildER: ${addsuffix .build, ${RESULTSER}}buildCR: ${addsuffix .build, ${RESULTSCR}}

%.build: ${RESDIR}/%.ps

Figure 2. Implemented much like the burn rule, the build rule updates easily reproducible result files.


them), we describe the rules in detail here so thatyou can understand their operation and adaptthem to your own computational environment.

Burn. As Figure 1 shows, make burn invokesa chain of rules, which ultimately finds and re-moves all easily reproducible result files. Everyapplication makefile contains the burn target aspart of the ReDoc rules.

The burn target invokes its dependencyburnER. The burnER rule selects the easily re-producible result files for removal. The burnERrule uses GNU make’s built-in functionaddsuffix to generate its dependency list. Eachentry of burnER’s dependency list is a concate-nation of the name of an easily reproducible fileand the suffix .burn. In the Frog example,burnER depends on frog.burn and dot.burn.The dependency frog.burn invokes the patternrule %.burn, which removes the result files cor-responding to the result frog. At our labora-tory, a single result name such as frog usuallydenotes several result files of identical contentsbut differing format, such as PostScript or GIF.The %.burn rule scans a list of possible suffixes(RES_SUFFIXES = .ps .gif) and removes allrelated result files: frog.ps and frog.gif.

Because text result files, such as dot.txt, arerare at our laboratory, our ReDoc rules do notcontain laboratory-wide rules for handling them.Given this, the Frog document’s author suppliesan explicit dot.burn rule in the makefile. Thisexplicit dot.burn rule overrides the default%.burn pattern rule, which generates PostScriptresult files.

The standard burn rule exclusively removesthe easily reproducible result files. Readers canremove conditionally reproducible result files byinvoking make burnCR and can exclusively re-move the result files related to the frog resultby invoking make frog.burn.

Build. Figure 2 shows the build rule, whichupdates the document’s easily reproducible re-sult files. The build rule’s implementation re-sembles the burn rule’s implementation.

At our laboratory, almost every result file is afigure in PostScript format. Consequently, theReDoc %.build rule updates the PostScript ver-sion of any easily reproducible result, such as${RESDIR}/frog.ps. ReDoc typically gener-ates additional versions of the result (such asfrog.gif) as a side effect of the rule that computesthe PostScript version (frog.ps).

As in the case of dot.burn, authors must sup-

ply an explicit dot.build rule for the nonstan-dard text result dot.

View. As Figure 3a shows, the view rule up-dates and displays the results.RESULTSALL lists all result files; we define it

as the concatenation of RESULTSALL, RE-SULTSCR, and RESULTSER.

At our laboratory the %.view rule checks forthe various formats of a result and chooses thefirst version that the makefile can generate. Thevariable CANDO contains the return value of a re-cursive gmake -n call. This return value indi-cates whether the system can build that particu-lar version of the result. As Figure 3b shows,once it finds a workable version, the %.view ruleinvokes another rule (%.viewgif or %.viewps)that updates and displays the result file.

In the Frog example, the frog.view rulefinds a rule for computing a .gif version of frog.It then invokes the GIF rule frog.viewgif. Inreturn, frog.viewgif executes Xview, a .gifviewer, to display the result file frog.gif. If

Clean, up-to-date supportKeeping documents clean and up to date has obvious benefits

in terms of order and reader confidence. Furthermore, we foundthat only clean documents function on a CD-ROM; intermediatefiles stored in a CD-ROM’s read-only memory cannot be overwrit-ten when readers later regenerate the files. Unfortunately, in earlyversions of ReDoc, we found that popular make dialects did notsupport rules to keep documents simultaneously clean and up-to-date. Such dialects considered a result file outdated when certainintermediate files were missing (for example, GNU make consid-ered a result file outdated when a nonpattern rule formulated amissing file’s dependency). Treating intermediate files this waywas convenient for software maintenance, but unsuitable for re-producible electronic documents.

Today, ReDoc rules maintain a clean, up-to-date document inthe GNU make dialect. At our request, Richard Stallman enhancedGNU make to adequately handle the ReDoc rules’ intermediatetargets (GNU make refers to intermediate files as secondary files,as it uses “intermediate” in a slightly different, more restrictivesense). Stallman added a special built-in target, .SECONDARY,that lets authors choose GNU make’s behavior with respect to itsmissing intermediate files. If a makefile includes a .SECONDARYtarget without dependencies (the default at our laboratory), everymissing intermediate file is presumed up to date. We implementedthe .SECONDARY target in GNU make versions higher than 3.74.The current GNU make version we use is available on our Website, http://sepwww.stanford.edu/research/redoc.


your computer system does not support XView,you must supplement the %.viewgif rule withyour own display command. To display the re-sult file frog.ps, frog.viewps executesGhostview.

Clean. A cleaning rule is important because acleaned directory is more accessible and invitingto readers than a cluttered one. Furthermore, acleaning rule saves resources, such as disk mem-ory, by removing superfluous files.

In theory, a community’s cleaning rule wouldremove the intermediate files and thereby iso-late the source and result files; a universal clean-ing rule would recognize the intermediate filesaccording to the community’s naming conven-tions. Unfortunately, such a rule cannot possi-bly anticipate all the names authors mightchoose for intermediate files. Consequently, ourlaboratory does not supply a fixed, universalclean rule, but rather a Jon’s clean (jclean)rule. jclean removes the files that adhere toour laboratory’s naming convention for inter-mediate files. Authors are responsible for imple-menting their own clean rules, and most acceptthe default cleaning rule by defining clean asclean: jclean.

Some of our authors append the defaultjclean with a command to remove additionalfiles that do not adhere to the naming conven-tions. Very few authors ignore the jclean tar-get (and its communal wisdom) and design theirown rule.

Because the author of the Frog example ad-heres strictly to the ReDoc file naming conven-

tions, the default jclean mechanism suffices toremove the intermediate files, as Figure 4 shows.

The jclean target uses two methods to iden-tify intermediate files. The first method,klean.usual, simply removes files whosenames fit one of the rule’s name patterns, suchas the executable frog.x, or the intermediatebitmap files junk.pgm and junk.ppm. The sec-ond method, klean.fort, removes Fortranfiles, such as frog.f, if Ratfor versions of theprogram, such as frog.r, exist.

We have successfully used the Re-Doc rules in our laboratory’smost recent sponsor report (14articles by 15 authors) and three

of Jon Claerbout’s textbooks on seismic imaging.These documents contain a total of 483 resultfiles: 276 easily reproducible, 21 conditionallyreproducible, and 186 nonreproducible figures.Before publication, automatic scripts removedand rebuilt all 276 easily reproducible result files(see Figure 5). We use the same scripts and doc-uments to benchmark computer platforms. Ourlaboratory has also published 12 PhD theses us-ing an earlier version of the user interface basedon cake.3 The theses are available on CD-ROM.

Researchers in our laboratory greatly enjoyand benefit from using ReDoc, and we continueto employ reproducibility to check the quality ofall our laboratory’s publications. Although webelieve that reproducible documents representpromising opportunities for Web publications,we are not currently pursuing concrete solutionsto this end.

AcknowledgmentsWe appreciate Richard Stallman’s advice and hisimplementation of the special built-in target.SECONDARY. Joel Schroeder conceived the three resultlists and understood precedence of GNU make definitions.Dave Nichols discovered cake and taught our laboratoryhow to use it. Steve Cole and Dave Nichols wrote xtpanel(http://sepwww.stanford.edu/ oldsep/dave/xtpanel) andhelped wrap our tools into an interactive, electronic book.

jclean : klean.usual klean.fort ;

KLEANUSUAL := core a.out paper.log *.o *.x *.H *.ps *.gif klean.usual :

@-${TOUCH} ${KLEANUSUAL} junk.quiet@-${RM} ${KLEANUSUAL} junk.*

FORT_FILES = $(patsubst %.f,%,$(wildcard *.f)) junk.quietklean.fort:

@\for name in ${FORT_FILES} ; do \

if ${EXIST} $${name}.r ; then \${TOUCH} $${name}.f ; \${RM} $${name}.f ; \

fi ; \done

Figure 4. The default jclean mechanism removes intermediatefiles from the Frog example.


References 1. M. Stallman and R. McGrath, GNU Make, Free Software Foun-

dation, Boston, 1991.

2. A. Oram and S. Talbott, Managing Projects with Make, O’Reilly &Associates, Sebastopol, Calif., 1991.

3. Z. Somogyi, “Cake: A Fifth Generation Version of Make,” Aus-tralian Unix System User Group Newsletter, Vol. 7, No. 6, April1987, pp. 22–31; http://www.cs.mu.oz.au/~zs/ papers/pa-pers.html (current Oct. 2000).

Matthias Schwab works in the Frankfurt office of theBoston Consulting Group. He received his Vordiplomain mathematics and geophysics from the TechnicalUniversity Clausthal-Zellerfeld, following which hestudied on a Fulbright Scholarship with Gerry Gardnerat the University of Houston and Rice University. He re-ceived a PhD in geophysics from the Stanford Explo-ration Project. He is a member of the Deutsche Studi-enstiftung, Society of Exploration Geophysicists,American Geophysical Union, and European Associa-tion of Geoscientists and Engineers. Contact him [email protected].

Martin Karrenbach is vice president of seismic mod-eling at Paulsson Geophysical Services, where heworks in large-scale seismic modeling and full wave-form modeling for preacquisition, interpretation, andfeasibility studies for high-resolution seismic imaging.His other professional interests are in software devel-opment and signal processing. He received his MS ingeophysics from the University of Houston and hisPhD in geophysics from Stanford University, where hewas a member of the Stanford Exploration Project. Heis a member of the Society of Exploration Geophysi-cists, the European Assn. of Geoscientists & Engineers,and the American Geophysical Union. Contact him [email protected].

Jon Claerbout is a professor of geophysics at StanfordUniversity, where he founded the Stanford ExplorationProject. He has consulted for Chevron and was electedto the National Academy of Engineering. He is an hon-orary member of the Society of Exploration Geo-physicists, which awarded him both its highest award,the Maurice Ewing Medal, and the SEG FessendenAward for outstanding and original work in seismicwave analysis. He is also an honorary member of theEuropean Assn. of Geoscientists & Engineers and re-cipient of its Erasmus Award. He received his BS inphysics and his MS and PhD in geophysics from MIT.Contact him at [email protected].

Test iterations

120

100

80

60

40

20

Num

ber

of fi

gure

s re

prod

uced

1st test2 May

2nd test4 May

3rd test6 May

4th test9 May

Reproduced figures

Total figures

Figure 5. Successfully reproduced figures in a sample document containing 14 articles with 112 easily reproducible figures. To test adocument’s reproducibility, we use a cycle of burning and rebuildingits results. A simple script can implement such a reproducibility testby invoking the ReDoc rules, which remove and regenerate the document’s results independent of the document’s content. Aftereach test, the authors had time to make corrections. After the firsttest, only 60% of the document’s easily reproducible figures were infact reproducible. After the fourth test, almost all figures werereproducible and we published the document.

A comprehensive, peer-reviewedresource for the scientificcomputing field.

A comprehensive, peer-reviewedresource for the scientificcomputing field.

COMPUTER.ORG/CISEPORTAL

Documents

Making scientific computations reproducible