14
Non-programmers identifying functionality in unfamiliar code: strategies and barriers Paul Gross n , Caitlin Kelleher Department of Computer Science and EngineeringWashington University in St. Louis, USA article info Keywords: Non-programmer End-user Code search Strategy Barrier Comprehension Navigation Graphic output Storytelling alice abstract Source code on the web is a widely available and potentially rich learning resource for non-programmers. However, unfamiliar code can be daunting to end-users without programming experience. This paper describes the results of an exploratory study in which we asked non-programmers to find and modify the code responsible for specific functionality within unfamiliar programs. We present two interacting models of how non-programmers approach this problem: the Task Process Model and the Landmark-Mapping model. Using these models, we describe code search strategies non-programmers employed and the barriers they encountered. Finally, we propose guidelines for future programming environments that support non-programmers in finding functionality in unfamiliar programs. & 2010 Elsevier Ltd. All rights reserved. 1. Introduction Some research predicts that as many as 25 million US workers will perform some job-related computer programming tasks by 2012 [46]. The Bureau of Labor Statistics expects less than 3 million of these workers to be professional programmers [46]. There is also debate about whether enough formally trained programmers will be available to fill professional programmer positions in the US [4]. Hence there could be about 22 million professional end-user programmers, or programmers without formal training, in the near future. In addition to the large community of workers performing some programming, there are rapidly growing user communities exploring programming in recreational contexts. For instance, end-users are exploring mashups [53], stories and games [34], and image tool scripting [8]. To learn programming and to extend their skills as needed, many end-users learn from resources available on the web [1,9,44]. These online resources include API documentation, tutorials, and code examples. Assuming an end-user has a specific question, then currently tutorials and examples crafted to illustrate that specific concept or technique may be the best learning resources available. However, the number of carefully crafted examples and tutorials for specific questions are limited. In many cases, a user may be unable to find a tutorial or example that addresses their specific question or under- standing level. Source code repositories on the web contain many more examples that could be potentially useful for learning programmers [1]. Some end-user programming systems, such as Coscriptor [30], Scratch [37], and Greasemonkey [49], have affiliated example source code repositories intended for users to reuse and learn from. For more general programming, code search engines (e.g., [13,25,26]) index source code repositories and snippets (e.g., API documentation) that are available on the web. A learning programmer can potentially find an appropriate example program through searching these repositories; but for novice and non-programmers, the example may be unusable. Factors such as the distribution of code, program concurrency, and lack of informational Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/jvlc Journal of Visual Languages and Computing 1045-926X/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.jvlc.2010.08.002 n Corresponding author. E-mail addresses: [email protected] (P. Gross), [email protected] (C. Kelleher). Journal of Visual Languages and Computing 21 (2010) 263–276

Non-programmers identifying functionality in unfamiliar code: strategies and barriers

Embed Size (px)

Citation preview

Page 1: Non-programmers identifying functionality in unfamiliar code: strategies and barriers

Contents lists available at ScienceDirect

Journal of Visual Languages and Computing

Journal of Visual Languages and Computing 21 (2010) 263–276

1045-92

doi:10.1

n Corr

E-m

ckellehe

journal homepage: www.elsevier.com/locate/jvlc

Non-programmers identifying functionality in unfamiliar code:strategies and barriers

Paul Gross n, Caitlin Kelleher

Department of Computer Science and Engineering—Washington University in St. Louis, USA

a r t i c l e i n f o

Keywords:

Non-programmer

End-user

Code search

Strategy

Barrier

Comprehension

Navigation

Graphic output

Storytelling alice

6X/$ - see front matter & 2010 Elsevier Ltd.

016/j.jvlc.2010.08.002

esponding author.

ail addresses: [email protected] (P. Gros

[email protected] (C. Kelleher).

a b s t r a c t

Source code on the web is a widely available and potentially rich learning resource for

non-programmers. However, unfamiliar code can be daunting to end-users without

programming experience. This paper describes the results of an exploratory study in

which we asked non-programmers to find and modify the code responsible for specific

functionality within unfamiliar programs. We present two interacting models of

how non-programmers approach this problem: the Task Process Model and the

Landmark-Mapping model. Using these models, we describe code search strategies

non-programmers employed and the barriers they encountered. Finally, we propose

guidelines for future programming environments that support non-programmers in

finding functionality in unfamiliar programs.

& 2010 Elsevier Ltd. All rights reserved.

1. Introduction

Some research predicts that as many as 25 millionUS workers will perform some job-related computerprogramming tasks by 2012 [46]. The Bureau of LaborStatistics expects less than 3 million of these workers tobe professional programmers [46]. There is also debateabout whether enough formally trained programmers willbe available to fill professional programmer positions inthe US [4]. Hence there could be about 22 millionprofessional end-user programmers, or programmerswithout formal training, in the near future.

In addition to the large community of workersperforming some programming, there are rapidly growinguser communities exploring programming in recreationalcontexts. For instance, end-users are exploring mashups[53], stories and games [34], and image tool scripting [8].

To learn programming and to extend their skills asneeded, many end-users learn from resources available on

All rights reserved.

s),

the web [1,9,44]. These online resources include APIdocumentation, tutorials, and code examples. Assumingan end-user has a specific question, then currentlytutorials and examples crafted to illustrate that specificconcept or technique may be the best learning resourcesavailable. However, the number of carefully craftedexamples and tutorials for specific questions are limited.In many cases, a user may be unable to find a tutorial orexample that addresses their specific question or under-standing level.

Source code repositories on the web contain manymore examples that could be potentially useful forlearning programmers [1]. Some end-user programmingsystems, such as Coscriptor [30], Scratch [37], andGreasemonkey [49], have affiliated example source coderepositories intended for users to reuse and learn from.For more general programming, code search engines(e.g., [13,25,26]) index source code repositories andsnippets (e.g., API documentation) that are available onthe web. A learning programmer can potentially find anappropriate example program through searching theserepositories; but for novice and non-programmers, theexample may be unusable. Factors such as the distributionof code, program concurrency, and lack of informational

Page 2: Non-programmers identifying functionality in unfamiliar code: strategies and barriers

P. Gross, C. Kelleher / Journal of Visual Languages and Computing 21 (2010) 263–276264

cues (e.g., code comments) may affect a program’sreadability. Further, non-programmers’ lack of experiencesuggests they may not recognize these factors or recog-nize implicit program execution models, inhibiting theiruse of the example.

If a user finds a candidate program, he or she can runthe program to determine whether or not the programexhibits relevant output functionality. For instance, a usercan evaluate a web page with JavaScript rollovers or anExcel macro highlighting unique cells. Through searchingcode repositories and evaluating a program’s graphicaloutput, a learning programmer can potentially find arelevant example program that they could reuse and learnfrom.

However, finding the code responsible for the outputfunctionality in the relevant program is challenging forthe same reasons an example program is difficult tointerpret: a user’s search for the responsible code may beinhibited by his or her ability to read and understand aprogram’s source code. Thus finding code responsible forobserved behavior is a possibly difficult task for anintermediate end-user and a near impossible task for anon-programmer.

Our belief is that non-programmers can overcomethese difficulties with adequate support from softwaretools. Such support can enable them to reuse and learnfrom arbitrary programs that they find on the web. Todesign this support, we must first understand how non-programmers naturally approach finding code that isresponsible for an observed program behavior.

To understand non-programmers’ natural search be-havior, this paper describes an exploratory study in whichwe asked non-programmers to find, and in some casesmodify, code responsible for specific functionality withinunfamiliar programs. In completing the code search tasks,users leveraged landmarks, verbally identified points ofinterest, in both the output and code. They used theselandmarks to build mappings between code and outputand to determine code relevance. We describe this processusing two interacting models: the Task Process model andthe Landmark-Mapping model. Using these models, wecontextualize the strategies and barriers non-program-mers encountered while searching code. Based on thebarriers our users encountered, we suggest guidelines fordesigning programming environments that support newprogrammers in finding particular code sections inunfamiliar programs.

2. Related work

We have related work in several areas: Code Naviga-tion, Code Comprehension, and Novice Debugging. We arenot aware of other work focused exclusively on codesearch by non-programmers.

2.1. Code navigation

Code navigation studies the strategies programmersuse to find relevant areas of concern in code. Most of thisresearch focuses on professional programmers.

Recent code navigation studies (e.g., [23,27]) suggestthe navigation process users employ relates to Informa-tion Foraging theory [42]. Information Foraging wasintroduced in the context of web navigation and positswhen we, the predators, search for information, the prey,we rely upon information scent to estimate the probabilityof finding relevant information by following a particularlink. Links lead us to information patches with potentiallylower exploration cost. Lawrance et al. [27] used informa-tion foraging theory to explain the navigation practices ofexperts completing maintenance tasks and furthersuggested it can be used to model experts’ foragingbehavior for these tasks [28,29]. Ko et al. [23] similarlyinvestigated professional programmers’ navigation prac-tices for code maintenance tasks and illustrated a seek,relate, and collect model, grounded in informationforaging, for how experts created and used informationrelevant to their task.

Littman et al. [33] observed experienced programmersattempting code maintenance tasks and suggestedtwo navigation strategies: systematic and as-needed.Systematic users try to understand the program’s wholecode, and its causal relationships, before making changes.In contrast, as-needed users find specific areas of codenecessary to modify and make changes without consider-ing their affect on other program functions. Studies haveindicated systematic users perform better than as-neededusers on maintenance tasks [33,43].

Cox et al. [5] theorized a relationship between codenavigation and real world spatial navigation by use oflandmarks. A study by Fisher et al. [11] further suggestsusers’ gender may influence their spatial navigationstrategies and consequently their code navigation strate-gies.

2.2. Code comprehension

Code comprehension research explores the mentalmodels programmers use to represent code and how theyconstruct these models. Studies in this area are typicallyconcerned with what program information programmerscomprehend and recall. Our work considers short-termprogram comprehension and its use by non-programmersin code navigation.

Two fundamental code comprehension models aregenerally accepted: top-down [2], where users work torelate program goals to code, and bottom-up [41], whereusers focus on understanding code elements and thenrelate these to program goals. Other work suggestedexperts mix these models in making inquiries [31], andopportunistically choose a model [35].

Brooks [2] suggested the idea of beacons as stereo-typical code snippets that imply a specific, largerfunctionality (e.g., a variable swap implies a sort func-tion). These beacons can help programmers to quicklyidentify common functions. Further work investigated theexistence of beacons and suggested experts and novicesdo recognize a sort algorithm beacon [50,51], while otherssuggested novices do not reliably detect beacons [6].

Page 3: Non-programmers identifying functionality in unfamiliar code: strategies and barriers

P. Gross, C. Kelleher / Journal of Visual Languages and Computing 21 (2010) 263–276 265

Beacons and landmarks (from code navigation) aresimilar concepts, but Cox et al. [5] distinguished them bysuggesting that ‘‘beacons are a component of a landmark.’’For instance a big outdoor hamburger sign may indicate arestaurant. The sign is a beacon indicating the function ofa building. Having found the restaurant, it can be used as anavigational landmark.

Novice code comprehension studies observed thatnovices tend to read code sequentially, line by line, in abottom-up fashion that ignores control flow information[3,16,40]. Other research showed that novices’ compre-hension strategies differ with familiarity and domainknowledge [24,52]. Mosemann and Wiedenbeck [38] usedsoftware to control novices’ program navigation strate-gies. In essence, the available actions in the user interfaceforced users to adopt a pre-determined navigationstrategy. They found no navigation strategy significantlyimpacted novices’ comprehension in all comprehensioncategories.

2.3. Novice debugging

Novice code debugging investigates the strategiesemployed and weaknesses exhibited by novices inattempting debugging tasks. Novice debugging researchfocuses on users who have a working knowledge ofprogramming models (e.g., sequential execution) andprogram construction. While still novices, they are moreskilled than non-programmers. McCauley et al. [36]provide a recent area survey.

Katz and Anderson [17] studied novice debugging andobserved two general search strategies: forward reason-ing, where ‘‘search stems from the actual, written code’’,and backward reasoning, where ‘‘search starts fromincorrect behavior of the program’’. One example of abackward reasoning strategy is simple mapping where anovice tries to correlate a specific output result to a line ofcode. Like Katz and Anderson, Fitzgerald et al. [12] foundnovices’ preferred forward reasoning and observed userstracing code (e.g., mental execution), and attempting tomatch learned syntactic patterns against program code toidentify incorrect code sections.

Ko and Myers [21] observed end-users’ inclinationtowards interrogative debugging and created the Why-Line interface to support it [22]. Whyline allows users torelate observed output errors to lines of code throughasking ‘‘why’’ or ‘‘why not’’ questions. The asking of ‘‘why’’and ‘‘why not’’ questions rely on a user’s expectations of aprogram’s output. These expectations are likely based on auser’s familiarity with a program, which is unexpectedfrom non-programmers.

Much work also considered debugging strategies ofend-users in spreadsheet applications. Kissinger et al. [20]investigated the information gaps between what informationusers seek for when testing and what spreadsheet interfacesprovide. Ruthruff et al. [45] identified spreadsheet debuggingstrategies end-users employ and Subrahmaniyan et al. [48]focused on differences in strategies based on a user’s gender.Grigoreanu et al. [14] created StratCel to support end-users’debugging strategies with interactive, automatically gener-ated to-do lists.

3. Methods

Our goal is to understand how non-programmersnaturally approach finding the code responsible foroutput behavior. We conducted an exploratory study inwhich we asked non-programmers to identify, and insome cases modify, code responsible for specific function-ality in the output of unfamiliar programs.

3.1. Storytelling Alice

For this study we used the Storytelling Alice program-ming environment [19]. Storytelling Alice allows users tocreate interactive 3D animated stories by writing pro-grams that invoke methods (e.g., turn, say, walk) onobjects (e.g. fairies, trees, people). Users constructprograms using a drag-and-drop interface that preventssyntax errors. The environment supports most program-ming constructs taught to beginning programmers (e.g., ifstatements, loops) and event handling for user interac-tions. Fig. 1 illustrates adding a line of code in StorytellingAlice.

Storytelling Alice has no explicit support for searchingprogram code (e.g., find, debugger, etc.). Other environ-ments may offer these search affordances, but they mayor may not map to how a user, most notably a non-programmer, conceives of searching a program. In thiswork, we explore non-programmers’ natural searchprocesses to guide the design of new tools that willexplicitly support natural search processes.

3.2. Instruments

To provide realistic task scenarios, we investigatedprograms which users are likely to find on the web andanalyzed them to extract a list of relevant properties.We derived dimensions describing these propertiesand considered these dimensions in our design of theprograms and tasks used in our study.

3.2.1. Program properties

We randomly selected 15 programs submitted to theAlice.org user forums and reviewed their properties. Wesummarize the programs’ properties in Table 1 and showexamples from two of the programs in Figs. 2 and 3. Fromthese properties we derive six dimensions relevant tofinding code in unfamiliar programs that these programsvary along. We discuss them in the following paragraphs.

3.2.1.1. Program length. The selected programs varied inlength from 27 to 2137 lines of code, and contained amedian of 437 lines of code. We define the dimensionProgram Length to describe the relative size of a program.We consider programs to be larger if they contain morelines of code than the median.

Further, not all lines of code in the program executed,or were executable through events, during a program run.Programs with non-executing code create the potentialfor users to search in code which cannot relate toobserved output features. Four programs executed less

Page 4: Non-programmers identifying functionality in unfamiliar code: strategies and barriers

Fig. 1. Storytelling Alice where a user programs by (1) dragging a method, (2) dropping it into the code pane, and (3) selecting parameters.

Table 1Properties of programs randomly selected from the Alice.org forums.

Program Name Lines of

Code (LOC)

LOC executed

(%)

Executed LOC

in main

method (%)

Interactive

events

Duplicate

object

Names

Duplicate

methods

Con-currency Dialog

bubbles

A LiFe Of A bIrD 685 18.4 2.4 11 Yes No Yes Yes

Acrobatic Seaplane 48 100.0 56.3 0 No No Yes No

Bug swatter 515 96.3 6.7 4 Yes Yes Yes No

C.A.R. 437 92.4 1.0 36 Yes Yes Yes No

Create A Park! 408 99.8 0.2 33 Yes Yes Yes No

Deserted 299 99.7 2.0 4 Yes Yes Yes No

Dragon and Beetle 336 99.1 53.5 0 Yes Yes Yes Yes

Eski-Massacre Ver. 1.0 1392 35.6 8.5 15 Yes Yes Yes Yes

MyLife 2137 8.3 20.3 0 Yes Yes Yes Yes

Orsega Solar System 27 100.0 88.9 0 No No Yes No

Skater’s Puzzle 1084 23.8 4.3 15 Yes Yes Yes Yes

Snow Border 695 89.5 0.5 2 Yes Yes Yes No

Splash 58 100.0 39.7 0 No No Yes Yes

War Zone 1505 99.5 63.2 0 Yes Yes Yes Yes

Zombie Fin 112 98.2 12.7 11 Yes No Yes No

P. Gross, C. Kelleher / Journal of Visual Languages and Computing 21 (2010) 263–276266

than 36% of the code they contained, meaning themajority of the program’s code was irrelevant toobservable output functionality. Because static analysiscould demarcate or remove unused code, we chose not tofocus on this property.

3.2.1.2. Modular design. The programs’ main methods didnot contain all code that executed during a program run(see Fig. 2). All programs began in a main method, themost likely search starting point, but the median programhad only 8.5% of its executing code in the main method.The rest of the code was distributed to user createdmethods that were, or were not, referenced in themain method or any other methods invoked by main

(e.g., methods invoked by interactive events). With thecode distributed to other methods, users are forced tosearch outside the main method. Understanding programexecution models and program structure may benecessary to effectively search programs with distributedcode. We define the dimension Modular Design to describethe relative amount of a program’s executing code that isin the program’s main method. A program is deemedmodular if it has interactive events or has no more than8.5% of its executing code in the main method.

3.2.1.3. Interactivity. Nine selected programs containedinteractive event handlers. In Storytelling Alice, theseevent handlers are declared outside the main method,

Page 5: Non-programmers identifying functionality in unfamiliar code: strategies and barriers

Fig. 2. Program SnowBorder randomly selected from the Alice.org user forum. The program is modular because it contains few lines of code in its ‘‘main’’

method (i.e., world.myfirstmethod()) (1), and has more code in other user created methods such as maleBalletDancer5.1st trick() (1). The program is

interactive because it contains interactive events moving the ballet dancer (2). Multiple copies of a maleBalletDancer create ambiguous object/method

names (3) and (4) as all maleBalletDancer copies have the same methods. The program uses concurrency in its ‘‘main’’ method with the Do Together block

(5), and is constantly concurrent because of its interactive events (2).

Fig. 3. A sample from the Dragon and Beetle program randomly selected

from the Alice.org user forum. The program contains a say statement

which causes a dialog bubble to appear when the program executes.

P. Gross, C. Kelleher / Journal of Visual Languages and Computing 21 (2010) 263–276 267

the most likely search starting point (see Fig. 2). Thus,mentally tracing the execution of a program’s mainmethod will not directly identify the code responsible foroutput that depends on an interactive event. Further,events can occur at any time during the execution of aprogram, even concurrent with other code, nullifyingmost temporal cues which may aid in search. We definethe dimension Interactivity to describe if a program hasinteractive events.

3.2.1.4. Ambiguous object/method names. The majority ofthe programs we collected contained duplicate object names

and methods. Users of Storytelling Alice tend to duplicateobjects, and consequently their methods, because StorytellingAlice does not implement dynamic object creation andpolymorphism (see Fig. 2). When a user duplicates an object,the duplicate has the same name with an incremented index.For instance, the program Bug Swatter had users press thespace bar to swat at scarab beetles. There were nine scarabs,ambiguously named scarab1 through scarab9. Each scarabhad a copy of user created methods such as chase1 and face,duplicating their functionality in multiple locations. Suchduplicates can create ambiguity for users when a user tries toattribute lines of code to specific objects. We define thedimension Ambiguous Object/Method Names to describe whena program has an ambiguous naming convention for eitherobjects or methods.

3.2.1.5. Constant concurrency. All the programs usedconcurrency to some degree. Four programs usedconcurrency in shorter blocks while eleven hadconcurrent threads occurring throughout the whole pro-gram (e.g., five programs had flames which constantlyflickered) or available through interactive events (seeFig. 2). We informally observed that non-programmersrelied on sequential execution in searching for actions andstruggled with actions occurring simultaneously. Weexpect all programs to have some concurrency, and definethe dimension Constant Concurrency to describe if aprogram has continuously concurrent threads throughoutthe length of the program. We also consider programswith interactive events constantly concurrent because anevent concurrent with other running threads can occur atany time.

3.2.1.6. Dialog bubbles. Almost half the programs weselected used dialog bubbles. In our informal observations,

Page 6: Non-programmers identifying functionality in unfamiliar code: strategies and barriers

P. Gross, C. Kelleher / Journal of Visual Languages and Computing 21 (2010) 263–276268

non- programmers focused on dialog as a helpful cue be-cause it provided both a unique temporal marker in therunning program and a unique line in the program code (seeFig. 3). We define the dimension Dialog Bubbles to describe ifa program uses dialog bubbles in its executing code.

3.2.2. Study programs

Using the program dimensions as guidance, we con-structed four Storytelling Alice programs for use in ourstudy (see Table 2). We did not use the programs sampledfrom the web forum in the study because no subsetadequately covered Storytelling Alice language constructs.The programs are available online at [47].

For each program we composed a series of five tasks ofvarying complexity. The tasks asked users to identify, andin some cases modify, code responsible for specificfunctionality in a program’s output. Each task causedusers to confront these dimensions and/or languageconstructs either by design or because of its associatedprogram’s design. To illustrate these confrontations foreach dimension, we provide an example task and explainhow the task plays on a dimension in the followingparagraphs.

3.2.2.1. Program length. Both Magic Trees and Race Worldare larger programs. Consequently, all tasks associatedwith these programs take place in a larger executing codebase and afford more irrelevant, non-executing code tosearch.

3.2.2.2. Modular design. The task ‘‘Coach Declare Winner’’for Race World requires a user to find where a coach de-clares the winner of a race. This code executes close to theend of time, but is not at the end of the main method.Users need to navigate through two user editable meth-ods to find the target code.

3.2.2.3. Interactivity. The task ‘‘Banana Pick Up’’ for theprogram Race World tasked users to find the code causinga character to pick up a banana. The character picks up abanana when another character is clicked on; thus theuser must investigate a method called by an interactiveevent rather than searching sequentially through themain method.

3.2.2.4. Ambiguous object/method names. The Magic Treesprogram contains five trees: one named bonsai and four

Table 2A description of the four programs used in the tasks and their properties.

Program name: descriptionProgramlength

Moddes

Fish World: three fish swim around and make motions at

one another

Smaller Not

mod

Woods World: creatures argue about teddy bear, three

main methods concurrently execute

Smaller Mod

Magic Trees: two kids discover fairies hidden in trees, large

main code block

Larger Not

mod

Race World: two students race, winner is randomly

determined, user throws bananas

Larger Mod

named artsyTree1 through artsyTree4. The task ‘‘FairyAppear, Fly Away’’ had users search for where artsyTree2began shrinking. The correct starting point occurred neara similar method invocation for artsyTree1, requiring theuser to manage object name ambiguity.

3.2.2.5. Constant concurrency. The Woods World programhas four user editable methods running constantlyconcurrently: three operating on the main characters ofthe scene simultaneously and one flapping a fairy’s wings.The task ‘‘Pig Point’’ asked users to find the code causing apig to raise its right arm. This action occurs concurrentlywith a centaur turning and the pig saying something.Users must recognize the program’s concurrent nature tonarrow their search space and not attribute output actionsto the incorrect, concurrent lines of code.

3.2.2.6. Dialog bubbles. For Fish World, the task ‘‘BobChase’’ asked users to find where a spiky fish turns towarda little fish and then chases the little fish. The solution isfound in the main method inside a concurrencyblock. However, very similar method invocations andconcurrency blocks appear at different points in the mainmethod, and there are no dialog bubbles to assist withtemporal disambiguation.

3.3. Study sessions

The study took place in single, two-hour long,independent participant sessions. At the beginning of asession, a participant filled out a short computingexperience survey and completed the in-software tutorialprovided with Storytelling Alice. The in-software tutorialincludes three chapters that introduce users to sequentialexecution, navigation, program construction and editing,creation of new methods, and the use of events.

3.3.1. Study task types

To avoid providing linguistic cues that might biasparticipants’ search strategies, we presented tasks usingshort video clips of a given program’s output. In eachvideo, we highlighted target object(s) and actions using ared box. We faded all other objects in the world, as seenin Fig. 4. We referred to each task by an abbreviation(e.g., we referred to the task ‘‘Debrah Say Bippity-Boo’’as DSBB).

ularign

InteractivityAmbig. object/method names

Constantconcurrency

DialogBubbles

ular

Passive Descriptive No No

ular Passive Descriptive Yes Yes

ular

Passive Ambiguous Yes Yes

ular Interactive Ambiguous Yes Yes

Page 7: Non-programmers identifying functionality in unfamiliar code: strategies and barriers

P. Gross, C. Kelleher / Journal of Visual Languages and Computing 21 (2010) 263–276 269

The study included two task types: bounding tasks andmodification tasks. We describe each task type and providean example.

Bounding tasks required participants to mark thebeginning and end of the code responsible for thefunctionality identified in the video. We refer to thesemarkers as beginning bounds and ending bounds. This typeof task simulates a user who has found a program with aninteresting output feature and wants to find the codeimplementing that feature.

As an example bounding task, the task ‘‘Debrah SayBippity-Boo’’ for the program Magic Trees asked usersto find code responsible for a character saying‘‘Bippity-boo!’’ three times. When a user believed he orshe found the responsible code, he or she would placebeginning and ending markers just before and after thecode causing the speech. The user could then run theprogram to verify the markers displayed text just beforethe character said ‘‘Bippity-boo!’’ the first time and justafter the character said ‘‘Bippity-boo!’’ the third time.

Modification tasks asked participants to make a veryspecific change to the code which affected the outputfunctionality as indicated to the user. Modification taskvideos included titles indicating a modification wasrequired, the initial output, an intentionally minimalchange description, and the target output.

For an example modification task, the Fish World task‘‘Angela Head Pat’’ requested that users change thenumber of times a fish ‘‘patted’’ its head from two tofour. The task video showed the fish ‘‘patting’’ twice, thendisplayed the text ‘‘Modify the fish world so that the fishdoes this four times, like this:,’’ and then showed four‘‘pats’’. The correct solution required a user to change aloop bound from two to four.

3.3.2. Task completion

We instructed participants to tell us when theybelieved they had correctly completed a task. Weintended to simulate a participant discovering interesting

Fig. 4. A screenshot of the Magic Tree’s task ‘‘Debrah Say Bippity-Boo’’

which asked users to find the code responsible for the character saying

‘‘Bippity-boo!’’ three times.

functionality and then searching for the correspondingcode. Under this assumption, a participant must decide ifhe or she has found the correct code. Participants receivedfeedback if they ran the program after placing a boundmarker or making a modification. Bound markers areanimations that display a text message for one second inthe scene. If a participant played a program containing abound marker, then he or she could observe when thatmessage appeared. Neither the software nor the research-ers provided any direct solution correctness feedbackduring a task or after a task’s completion.

To ensure that subjects understood bounding andmodification tasks, we asked each subject to completeone task of each type in a practice Storytelling Aliceprogram. After completing the two practice tasks, parti-cipants completed a series of experimental tasks. Wegenerated the task series by randomizing the presentationorder of the four programs and the five tasks for eachprogram. The randomization was intended to prevent anyordering effects. Each participant completed as manytasks as he or she could during the allotted time for thestudy.

For both bounding and modification tasks, weembedded the target code sections within much largerprograms. Participants searched the code and watchedboth the video and the running program to identify targetactions to search for. We asked participants to think aloudwhile completing these tasks.

3.4. Data

We collected a pre-study demographics and computerexperience survey, video recordings of participants asthey used Storytelling Alice, screen captures of partici-pants’ Storytelling Alice interactions, and participants’modified programs.

3.5. Participants

We solicited participants with on-campus fliers. Thefliers advertised for volunteers with little to no program-ming experience who were interested in making anima-tions and games with computers. This descriptioncoincides with the definition of an end-user programmeras an individual who programs computers to satisfy othermeans. Participants received $20 in recognition of theirparticipation.

Fourteen adults (university students or employees)participated in the study. Twelve had no prior program-ming experience. Two participants had previous exposureto programming, one ‘‘at least five years ago’’ and theother more than 20 years before. We considered these twoparticipants to be non-programmers because they com-municated no active knowledge of computer program-ming.

Participants reported using computers for an averageof 23 hours per week, primarily for web browsing, email,and office productivity.

Page 8: Non-programmers identifying functionality in unfamiliar code: strategies and barriers

P. Gross, C. Kelleher / Journal of Visual Languages and Computing 21 (2010) 263–276270

3.6. Analysis

The two authors independently coded each sessionvideo. The coding scheme consisted of two types ofinformation: searches and landmarks.

3.6.1. Searches

For each search, we coded beginning and ending timesfor the search, the search space, and the participants’search target. Searches could occur in four spaces: thevideo, the running program, the Storytelling Alice codepane, and other Storytelling Alice panes (e.g., object tree,object details, events).

3.6.2. Landmarks

As users searched for specific functionality within anunfamiliar program, they often verbally referenced spe-cific features in the output (the video or running program)or the program itself (code pane or other panes). Forexample, a participant might say, ‘‘The fish gets bigger andturns’’ while watching the output. We call these featureslandmarks as suggested by Cox [5] because the verbaliza-tions are often coupled with code navigational logic (e.g.,‘‘the fish spins before he turns to face the camera’’). Wefurther delineate landmarks by their source. Code land-

marks are points of interest a user identifies in programcode, and output landmarks are features a user identifies ina task video or an executing program.

For each landmark, we coded the landmark’s content,the data type (e.g., object, action, text) and the source(video, running program, code pane, or other panes).Additionally, we recorded a specific reason for the usageof each landmark. A landmark might be used as atemporal comparison or identified as included in orexcluded from the participant’s search targets. We usedthese records to identify what information participantsused in their searches and how they used thatinformation.

Table 3Users’ task performance results.

Task type Taskattempts

Percentcorrect

Avg.time

Max.time

Bounding 84 33% 4:42 26:20Modification 39 72% 5:27 27:29All 123 41% 4:54 –

Fig. 5. Task Process Model represents the typical task workflow when a subj

indicated by the numbers in parenthesis.

3.6.3. Other data

We also transcribed participants’ statements abouttheir progress or mental models and noted any solutionsthey generated.

3.7. Error analysis

To ensure coding consistency, the two authors inde-pendently coded two 10 min sections of two usersessions. The authors reviewed the codings to establishcoding guidelines and then independently coded all theremaining sessions. The completed codings have an 82%agreement rate.

4. Results

Finding target code in a program is difficult for non-programmers. Overall, participants generated correctsolutions for only 41% of their tasks (33% of boundingtasks’ two bounds and 72% of modification tasks). SeeTable 3 for a summary of participants’ performance. Weconsidered a bound correct if a participant placed abound marker in a position which caused the marker toexecute and display the marker text immediately before(or after) the target functionality. We consider any otherlocation incorrect. Participants completing modificationtasks frequently tested and changed their answers,causing the greater success rate and average task time.Some participants spent more than 20 min on a singletask.

We present two models that describe how non-programmers approach finding target code in unfamiliarprograms. The Task Process model (see Fig. 5) representsthe task workflow participants used when attempting atask. To account for the information created and used bysubjects during the Task Process model, we createdthe Landmark-Mapping model (see Fig. 6). This modelcontains both code landmarks and output landmarks. Asparticipants work through tasks, they develop mappings

between code and output landmarks.The Task Process model is broken into a series of

numbered transitions (see Fig. 5) discussed below. Witheach transition we present common search barriers (seeTable 4) and strategies (see Table 5).

ect attempted a task. The model is broken into five transition sections

Page 9: Non-programmers identifying functionality in unfamiliar code: strategies and barriers

P. Gross, C. Kelleher / Journal of Visual Languages and Computing 21 (2010) 263–276 271

4.1. Task process model section (1)

Participants began a task along path (1) by watchingthe task video. While watching the task video for the firsttime, 45% of the time participants verbally noted outputlandmarks (e.g., ‘‘the centaur turns’’ or ‘‘she says ‘Thankyou, I’m free’’’). Denoting these landmarks added them tothe participant’s output landmark set as indicated in theLandmark-Mapping model.

Two common failures can begin in this early section:Object and Action Encoding (12/14 users, 12/20 tasks):

when a user identified a landmark, he or she encoded thatlandmark using a description (e.g., ‘‘[the pig is] pointing at

Fig. 6. Landmark-Mapping Model organizes landmarks identified by

subjects into two sets that correspond to landmark identification space.

Table 4Barriers non-programmers encountered when attempting tasks.

Barrier Name Num. UsersHitting Barrier

Num. TasksObserved In

TeSe

Object and action encoding 12/14 12/20 4.1

Memory failure 7/14 8/20 4.1

Method interpretation 13/14 12/20 4.3

Lack of temporal reasoning 10/14 10/20 4.3

Temporal reasoning overuse and

ignoring constructs

13/14 12/20 4.3

Magic code 7/14 15/20 4.5

Table 5Strategies non-programmers employed when attempting tasks.

Strategy nameNum. usersobserved using

Num. tasksobserved in

Pct. of allsearches

Textsecti

Text and

semantic14/14 20/20 20% 4.2

Temporal 14/14 19/20 14% 4.2

Comprehensive 14/14 17/20 7% 4.2

Exhaustive 11/14 10/20 2% 4.2

API 7/14 10/20 2% 4.2

Explorative 8/14 8/20 3% 4.2

Context 9/14 8/20 1% 4.4

Uncategorized – – 51% –

the cage’’ or ‘‘[the] pig raises [his] right arm’’). When userssearched for these actions in the code, they often did so bylooking for key phrases such as ‘‘pointing’’ or ‘‘right arm’’.If they failed to find these phrases, the search was neverresolved.

Memory Failure (7/14 users, 8/20 tasks): sometimes aparticipant misremembered actions in the video. This some-times led the participant to incorrectly use landmarks.

4.2. Task process model section (2)

Having registered one or more landmarks from theinitial video viewing, participants transitioned to programcode and began a ‘‘Code Search’’. In 72% of initial codesearches, participants verbalized a landmark as the searchtarget. As they navigated the code, 57% of participantsidentified additional code landmarks to search for in thetask video or the running program. These code landmarkswere added to the code landmark set in the Landmark-Mapping model. When participants successfully identifieda code section they believed accounted for a landmark,they formed a mapping [17]. In the Landmark-Mappingmodel, mappings are in the intersection of the outputlandmark and code landmark sets.

Participants cycled between ‘‘Code Search’’ and‘‘Output Search’’ while they added to and refined theirlandmarks and mappings. This continued until a partici-pant believed he or she had enough mappings to generatea solution. As the landmark sets grew, participants

xtction

Description

User description of observed output not in code text

User incorrectly remembers target output

User misinterprets method invocations’ cues

User does not use sequential execution information

User interprets sequential execution of code when

inappropriate because of language construct

User associates incorrect functionality with code running

concurrently

onTargetgen.

Description

NoUser searches code for text similar to their search target

description

No User leverages timing to narrow search space

No User locally focuses search around ‘‘known’’ code

No User attempts to search entire code space

Yes User views objects’ interfaces to find search targets

YesUser randomly explores interface for new information

to use as search target

YesUser identifies output immediately around target

actions for new search targets

–Users did not verbalize a strategy for all observed

searches

Page 10: Non-programmers identifying functionality in unfamiliar code: strategies and barriers

P. Gross, C. Kelleher / Journal of Visual Languages and Computing 21 (2010) 263–276272

organized them into subsets. In the Landmark-Mappingmodel, these subsets are denoted as included and excluded

landmark subsets. The user perceived included landmarksas part of the solution and excluded landmarks asextraneous.

In this Task Process model section, participants oftenused the following strategies to build mappings:

Text and Semantic Search (14/14 users, 20/20 tasks, 20%of searches): in a text and semantic search, the participantidentified a target and scanned either for specific text orfor text semantically similar to their target. This type ofsearch frequently failed when the participant couldnot reconcile their description of an output landmark(e.g., ‘‘[the pig is] pointing at the cage’’, ‘‘I don’t see thehand thing’’ when there is no hand) with a specific line orlines of code.

Temporal Search (14/14 users, 19/20 tasks, 14% ofsearches): a temporal search occurred when a participantused temporal information to reason about where thefunctionality identified in the video was located relativeto another code landmark. This did help users to narrowthe code search space. For instance, in the statement ‘‘soit’s gotta be somewhere in the part where basketball3 isfront of her, before [Melly] turns,’’ the participantidentified two code landmarks and used them to reasonabout where the functionality identified in the videoshould lie.

Comprehensive Search (14/14 users, 17/20 tasks, 7% ofsearches): participants’ focus did switch from globalto local when they identified a mapping with highconfidence. Comprehensive searches typically occurredin a small code section anchored on a specific codelandmark that was part of a mapping. If the participantbelieved that the anchor code landmark was relevant tothe solution, he or she may have used this strategy to findmore supportive temporal landmarks. If the participantdid not believe the anchor was relevant, he or she couldhave used the strategy to exclude the current region fromthe solution. In one comprehensive search, a participantbegan by identifying an anchor: ‘‘So I’m looking forDewdropWillowwind. So here’s DewdropWillowwindturning to face the camera.’’ Next, the participant mappednearby lines of code: ‘‘And [CordFlamewand] turn to facethe camera. They turn to face the camera and then they allmove forward. So this is the moving forward thing [in thevideo].’’ This second mapping helped the participantvalidate the original mapping.

Exhaustive Search (11/14 users, 10/20 tasks, 2% ofsearches): If the previously discussed strategies wereunsuccessful, participants may have turned to lessstructured and more desperate strategies. In an exhaus-tive search, the participant searched the entire recognizedcode space (note: participants may not have searchedsome method implementations because they did notrecognize they could). We observed two stages ofexhaustive search. In the first stage, participants searchedin any editable method associated with a target character(e.g., ‘‘maybe I need to open up all these user methods andtry to find something. That would be cumbersome. Findanother, find something that has the BurleyPig in it.’’).Failing the first stage, a participant searched all editable

methods available regardless of whether they related toany landmarks or targets they were looking to find (e.g., ‘‘Ican’t find this horse. I’ll just look in everything I guess.’’).

Not all search strategies were intended to generate asolution. Participants employed two common fallbackstrategies intended to generate more potential searchtargets. API Search (7/14 users, 8/20 tasks, 1% of searches)occurred when a participant selected an object andscanned that object’s list of methods to identify newsearch targets similar to any of their landmarks. Explora-

tive Search (8/14 users, 8/20 tasks, 3% of searches) was alast resort search in which participants appeared torandomly click through the interface. Sometimes theserandom explorations led the participant to a piece ofinformation that helped the participant formulate a new(productive) search.

4.3. Task process model section (3)

The process of cycling between code and outputsearches continued until a subject believed theirmappings correctly identified a reasonable approximationof the responsible code region. As previously indicated,most solutions were incorrect. Although there were manyreasons for incorrect solutions, three failures appearedfrequently in this Task Process Model section:

Method Interpretation (13/14 users, 12/20 tasks):participants’ abilities to form correct mappings werefundamentally tied to their interpretations of a method’sbehavior given its name and parameters. In someinstances, methods provided too many cues, too few cues,or inappropriate cues about their function. Missing ormisleading cues could have caused a participant toinappropriately store a code landmark in the included orexcluded set of the Landmark-Mapping model.

Lack of Temporal Reasoning (10/14 users, 10/20 tasks):failure to use temporal reasoning caused participants tosearch more code than necessary. They also may havefailed to utilize operations that could have increased thesize of their excluded landmark sets (thus reducing thenumber of landmarks to map). By having searched excesscode and kept irrelevant code landmarks, participantsmay have created false mappings. Finally, withouttemporal reasoning, a subject may not have identifiednearby code landmarks to verify the correctness of theirinitial mappings.

Temporal Reasoning Overuse and Ignoring Constructs

(13/14 users, 12/20 tasks): participants naively appliedtemporal reasoning to programs containing constructssuch as loops, concurrent blocks, or multiple threads ofexecution. Failure to recognize the changing executionmodel caused participants to arrive at faulty solutions byincorrectly placing code landmarks and mappings into theexcluded or included sets of the Landmark-Mappingmodel.

4.4. Task process model section (4)

For a bounding task, finding a solution requiredmappings for the first and last action observed hence

Page 11: Non-programmers identifying functionality in unfamiliar code: strategies and barriers

P. Gross, C. Kelleher / Journal of Visual Languages and Computing 21 (2010) 263–276 273

the transitions back from ‘‘Solution’’ to either ‘‘CodeSearch’’ or ‘‘Output search’’. Additionally, participantsfrequently verified modification task solutions leading toa higher success rate for modification tasks.

4.5. Task process model section (5)

Not all searches or series of searches led to a clearsolution. In response to finding no mappings to theiroutput landmarks, some subjects turned to Context Search

(9/14 users, 8/20 tasks, 1% of searches). In a contextsearch, the participant searched the output for actionsthat happened shortly before or after the target function-ality. In one case, a participant stated ‘‘I was just gonnalook again and see y what part in the movie correspondsto y where the Horse is highlighted.’’ The participantthen identified output landmarks immediately before andafter the indicated functionality.

Context search usage occasionally gave rise to acommon failure we call Magic Code (7/14 users, 15/20tasks). Many participants correctly mapped temporallyrelated output landmarks identified through contextsearch. However, participants then failed to find theoriginal target near these newly identified mappings andconcluded: ‘‘it is in there, but I can’t see it’’. Thisconclusion produced an incomplete set of mappings, asusers may not have mapped the target functionality.

5. Discussion

As our results indicate that non-programmersstruggled to find and identify the code responsible forobservable output, we suggest design guidelines forsoftware supporting non-programmers in this task. Wethen discuss how our model corresponds with otherproposed models for program navigation, comprehension,and debugging. Finally we discuss the limitations of ourstudy.

5.1. Design guidelines for supporting non-programmers’

natural search processes

Our goal is to enable non-programmers to utilizearbitrary source code examples from the web as learningmaterials. In programs similar to those found on the web,we asked non-programmers to find the code responsiblefor a given, observable output functionality. Our resultsindicate that non-programmers struggled to find andidentify the responsible code.

We believe there are many approaches, which cansuccessfully transform arbitrary source examples intouseful learning materials for inexperienced programmers.This work intends to support these approaches byinforming how non-programmers search for outputfeature code in unfamiliar programs, a potential problemin any such approach.

While this study focused on participants using Story-telling Alice, we believe the models, strategies, andbarriers discussed apply to other domains. In particular,domains where most program execution is externally

observable such as web sites, user interfaces, games, andscriptable media authoring environments. To this end, weoffer the following design guidelines.

5.1.1. Connect code to observable output

When users search code for observed output function-ality, it is essential to help them interpret code in terms ofthe observed functionality. We could have alleviated ourparticipants’ struggles with interpreting code by showinghow the output changed when a line of code executed. Tosupport arbitrary code use by non-programmers, we needto explore how best to provide support in the program-ming environment that enables users to correctly andquickly form mappings between the code and output.

Previous research on novice and end user program-ming tries to help users connect code and observableoutput to ease the process of authoring. Some noviceprogramming environments provide an interactive modeto directly invoke methods on objects and view thegraphical result (e.g., [7,10]). Other environments graphi-cally represent the before and after states of an operation(e.g., [32,39]).

5.1.2. Help users reconstruct execution flow

When our participants encountered programs contain-ing programming constructs such as loops, do togethersand method calls, they tended to either interpret allstatements as executing sequentially or declare theexecution flow incomprehensible. Enabling users tocorrectly reason about the execution flow can help themto employ temporal reasoning effectively. This has thepotential to drastically improve users’ search efficiency.Students often learn new vocabulary words throughcontextual clues as they read. As non-programmersexplore unfamiliar code, there is an opportunity forprogramming environments to scaffold users’ mentalmodels and reasoning about unfamiliar programmingconstructs’ behavior.

One possibility is to highlight lines of code executing ata given point in time. Knowledge of constructs and theirexecution consequences may not be necessary if the focusis on what precisely is executing. Another idea is to allowusers to ask questions about execution such as ‘‘whathappens before/during/after this happens’’ based on aprevious execution or on static analysis for a deterministicsystem, similar to the WhyLine [21].

5.1.3. Provide interactions to fully navigate code

Participants in our study frequently struggled to findall code relevant to a particular search. Incompleteexhaustive searches and participants’ magic code creationprovide evidence of this struggle. Lacking code navigationaffordances is particularly disabling when users will beutilizing code they did not create.

Fully navigating code depends on the usability of theenvironment and the code itself. Offering intuitivenavigational cues may help new users to recognize thatcode can be further explored and to understand coderelationships. Also, providing an interface for allowing auser to trace through the lines of code that executed mayhelp.

Page 12: Non-programmers identifying functionality in unfamiliar code: strategies and barriers

P. Gross, C. Kelleher / Journal of Visual Languages and Computing 21 (2010) 263–276274

5.1.4. Help users use poorly constructed code

Programming environments have no control over theproperties of code that users find on the internet. Yet,lacking other supports, the structure and clarity of codethat users download can have a profound impact on theirsuccess. Programming environments enabling non-pro-grammers to utilize unfamiliar code must help overcomedifficulties associated with poorly designed and writtencode. With an understanding of typical usability problemsin user created code, we can build supports intoprogramming environments that help users to success-fully navigate imperfect code. Users are particularlyaffected by poorly chosen method names. Interfacesenabling users to view details about a method’s behaviorat the point where that method is invoked can increasethe method’s information scent and help users decide toexplore it or not.

A critical component to overcoming the usability ofother people’s code is to show meta-information aboutthe code that non-programmers find useful. Exploring thenatural information interests of a non-programmer in aparticular domain (e.g., end-users with spreadsheets [20])can help in designing interfaces that correspond to whatinformation they expect.

5.2. Relationship to other models

Our results relate to other models of programmersexploring program code for different purposes. In thefollowing sections, we compare our results to Lawranceet al.’s [27] use of information foraging theory to explainand model programmers’ program navigation and Koet al.’s [23] seek, relate and collect model of experts’program understanding. We also consider other resultsidentifying similar search strategies.

5.2.1. Information foraging theory

Previous work suggests that programmers’ informa-tion foraging behavior is characterized by searchers usingproximal cues to identify high-value patches of code toexplore for their prey [23,27]. Lawrance et al. [28,29] haveshown that information foraging theory can predictprogrammers’ code navigation behavior when assumingthe proximal cues are textual. While we observed textualinformation foraging, we also observed informationforaging in non-textual spaces.

Our participants used textual cues to determineinformation scent and find low-risk patches, behaviorsimilar to that of expert programmers [27]. However,unlike experts, they did not always understand or reasoncorrectly about control-flow constructs. As non-program-mers, our participants lacked experience with constructssuch as loops and conditional statements. This lack ofexperience inhibited some participants’ ability to deter-mine information scent for code patches in constructblocks. Additionally, these Storytelling Alice languageconstructs are graphically represented using a combina-tion of text, color, and containment (i.e., programmingstatements appear inside larger construct blocks).

In addition to using textual cues, our participantsfrequently used temporal information in determining prey,a behavior that has not been observed among expertprogrammers [27]. The non-programmers in our studyexpected that temporal proximity in the output correlatedwith spatial proximity in the code. When some participantscould not find their prey through textual search, theyidentified output landmarks just before and after their targetfunctionality. These output landmarks became new prey, andparticipants foraged for patches corresponding to theselandmarks. Participants occasionally overestimated the valueof these patches because they assumed finding codecorresponding to output around their target functionalitywould be located near the code corresponding to the targetfunctionality, which is not always true.

These results suggest that predicting non-program-mers foraging activity requires accounting for more typesof cues than textual cues.

5.2.2. Seek, relate, and collect model

The Landmark-Mapping Model relates to Ko et al.’s model[23] of how experienced programmers seek, relate, andcollect information during maintenance tasks. Ko et al.proposed a model in which developers seek for relevant taskinformation, relate this information to previous knowledge todecide their next step, and continue collecting relevantinformation until they feel they have enough information toimplement a solution. Although this model applies toexperienced programmers, we found that non-programmersuse a similar high-level process by seeking, relating, andcollecting landmarks to decide on a solution. The Landmark-Mapping model relates to this task process by suggesting amodel for how non-programmers organize and relate thelandmarks they collect. It is not known if a similar model alsoexists for experienced programmers.

5.2.3. Similar search strategies

As presented here, non-programmers natural searchprocesses are similar to those observed for novices andexperts. Anecdotally, our subjects tended to approach thecode sequentially, line-by-line like novices [3,16,40]. Theyexhibited both forward and backward reasoning [17]evidenced by their creation of both code and outputlandmarks, but had no apparent preference for forwardreasoning similar to novices debugging [12].

5.3. Study limitations

The applicability of this study’s results may be limitedby Storytelling Alice’s language design, program visuali-zation, and absence of search support features.

Storytelling Alice provides an API that is based oncommon actions users want when creating stories [18].These API actions are named according to how usersdescribed the actions in user studies. As a result, the APIitself may provide richer cues than other languages. Ifusers find it more difficult to reason about whetherobserved behavior corresponds to a particular methodcall, they may develop different search strategies. Further,Storytelling Alice programs tend to refer to objects using

Page 13: Non-programmers identifying functionality in unfamiliar code: strategies and barriers

P. Gross, C. Kelleher / Journal of Visual Languages and Computing 21 (2010) 263–276 275

unique names throughout an entire program. These objectnames also provide cues that may be absent in code,where the use of variables and parameters is moreprevalent.

Because Storytelling Alice programs are animations,users see a visualization of the program execution. Linesof code typically correspond to graphical feedback in therunning program. This visualization enables users toobserve changes and link them to the code running atthat point. However, not all programs have a built in run-time visualization. Because they typically have actionsthat result in visual changes, end-user programmingdomains with graphical visualizations (e.g., web userinterfaces [9,44], and image tool scripting [8]) can providefeedback similar to Storytelling Alice. However, users’approaches to searching for code in other less visualprogram domains may differ significantly.

Storytelling Alice has no explicit support for searchingprogram code (e.g., find, debugger, etc). The presence ofthese features in other programming environments may alternon-programmers’ search strategies and create new barriers.

6. Current and future work

The long term goal of our research is to enable non-programmers to independently learn basic computerprogramming. To illustrate how that could happen,consider the following scenario: a user wants to build aStorytelling Alice program where a gymnast does abackflip, but he or she does not know how to create theanimation. However, the user has found a circus-themedprogram in which a clown does a back flip motion as partof his act. Using the circus-themed program, the useridentifies the code responsible for the backflip using codenavigation support tools. Once the user identifies therelevant code, the software extracts that code andconstructs a tutorial that guides the user throughrebuilding the backflip in his or her program. A keycomponent of this code reuse process is supporting theuser in finding the code responsible for the backflip codethey wish to reuse. Through helping to identify theprocess and barriers non-programmers encounter, thisstudy suggests directions for research in building supporttools to help non-programmers find and select the coderesponsible for target functionality.

Based on the results presented in this paper, we haveimplemented a tool that helps non-programmers toidentify, extract, and reuse functionality from unfamiliarprograms [15]. The tool uses a wizard-like interface toguide users through the code selection process. To helpnon-programmers identify the code responsible for targetfunctionality, the tool correlates screenshots of programoutput with the code executing at that time, one of theprimary activities in making mappings. By emphasizingthe currently executing code, and providing affordances tolocate the code in the program, the tool alleviates someprogram navigation difficulties. The tool asks a user toidentify the beginning and ending lines of his or her targetfunctionality, extracts it from the original program, andintegrates it into the user’s program. We are currently

working on generating tutorials that will guide usersthrough the process of building the selected code withinthe context of their own program.

While we are currently focusing on tools that enablenon-programmers to learn programming through reusingcode in an animation context, code reuse for learning toolsmay also be valuable for other end-user programmingaudiences, such as web development [9,44] and imagetool scripting [8]. Domains in which the output is visuallyobservable are particularly appropriate for output drivenreuse. Future research can (1) develop models of thesearch behavior of non-programmers and end-usersacross a broad spectrum of domains, (2) build tools thatenable these users to effectively identify code correspond-ing to target functionality, (3) build tools that enable usersto flexibly reuse code, and (4) build tools that transformweb code snippets into executable examples for non-programmers. This research can create more opportu-nities for new users in these communities to learnprogramming independently.

7. Conclusion

There is a large audience of potential programmers,including both children and adult end-user programmers,who do not have access to formal education in program-ming. Tools that enable these populations to learneffectively from freely available source code could opencomputer programming to these audiences. The TaskProcess Model and Landmark-Mapping model describehow non-programmers approach finding code. Thesemodels can inform the design of new tools that willenable non-programmers to more quickly and accuratelyfind the code that is responsible for functionality theywould like to learn from or reuse.

References

[1] J. Brandt, P.J. Guo, J. Lewenstein, M. Dontcheva, S.R. Klemmer, Twostudies of opportunistic programming: interleaving web foraging,learning, and writing code, in: Proceedings of the 27th InternationalConference on Human Factors in Computing Systems, ACM, 2009,pp. 1589–1598.

[2] R. Brooks, Towards a theory of the comprehension of computerprograms, International Journal of Man-Machine Studies 18 (1983)543–554.

[3] M.S. Carver, S.C. Risinger, Improving Children’s Debugging Skills,Empirical Studies of Programmers: Second Workshop, AblexPublishing Corp., 1987 pp.. 147–171.

[4] Committee on Prospering in the Global Economy of the 21stCentury: An Agenda for American Science and Technology, NationalAcademy of Sciences, National Academy of Engineering, Institute ofMedicine, Rising Above the Gathering Storm: Energizing andEmploying America for a Brighter Economic Future, The NationalAcademies Press, 2007.

[5] A. Cox, M. Fisher, P. O’Brien, Theoretical considerations onnavigating codespace with spatial cognition, in: Proceedings ofthe PPIG, 2005, pp. 92–105.

[6] M. Crosby, J. Scholtz, and S. Wiedenbeck, The roles beacons play incomprehension for novice and expert programmers, in: Proceed-ings of the PPIG, 2002, pp. 58–73.

[7] A.A. diSessa, H. Abelson, Boxer: a reconstructible computationalmedium, Communication ACM 29 (1986) 859–868.

[8] B. Dorn, M. Guzdial, Graphic designers who program as informalcomputer science learners, in: Proceedings of the Second InternationalWorkshop on Computing Education Research, ACM, 2006, pp. 127–134.

Page 14: Non-programmers identifying functionality in unfamiliar code: strategies and barriers

P. Gross, C. Kelleher / Journal of Visual Languages and Computing 21 (2010) 263–276276

[9] B. Dorn, M. Guzdial, Learning on the job: characterizing theprogramming knowledge and learning strategies of web designers,in: Proceedings of the 28th International Conference on HumanFactors in Computing Systems, ACM, 2010, pp. 703–712.

[10] W.F. Finzer, L. Gould, Rehearsal World: Programming by RehearsalWatch What I Do: Programming by Demonstration, MIT Press,1993, pp. 79–100.

[11] M. Fisher, A. Cox, L. Zhao, Using sex differences to link spatialcognition and program comprehension, in: Proceedings of the IEEEInternational Conference on Software Maintenance, IEEE ComputerSociety, 2006, pp. 289–298.

[12] S. Fitzgerald, G. Lewandowski, R. McCauley, L. Murphy, B. Simon,L. Thomas, C. Zander, Debugging: finding, fixing, and flailing, amulti-institutional study of novice debuggers, Computer ScienceEducation 18 (2008) 93–116.

[13] ‘‘Google Code Search’’ /http://www.google.com/codesearchS.[14] V.I. Grigoreanu, M.M. Burnett, G.G. Robertson, A strategy-centric

approach to the design of end-user debugging tools, in: Proceedingsof the 28th International Conference on Human factors inComputing Systems, ACM, 2010, pp. 713–722.

[15] P.A. Gross, M.S. Herstand, J.W. Hodges, C.L. Kelleher, A code reuseinterface for non-programmer middle school students, in: Proceed-ing of the 14th International Conference on Intelligent userinterfaces, ACM, 2010, pp. 219–228.

[16] R. Jeffries, A comparison of the debugging behavior of expert andnovice programmers, in: Proceedings of the AERA Annual Meeting,1982.

[17] I.R. Katz, J.R. Anderson, Debugging: an analysis of bug-locationstrategies, Human Computer Interaction 3 (1987) 351–399.

[18] C. Kelleher, R. Pausch, ‘‘Lessons Learned from Designing aProgramming System to Support Middle School Girls CreatingAnimated Stories,’’ Visual Languages—Human Centric Computing,IEEE Computer Society, 2006, pp. 165–172.

[19] C. Kelleher, R. Pausch, S. Kiesler, Storytelling alice motivates middleschool girls to learn computer programming, in: Proceedings of theSIGCHI Conference on Human factors in Computing Systems, ACM,2007, pp. 1455–1464.

[20] C. Kissinger, M. Burnett, S. Stumpf, N. Subrahmaniyan, L. Beckwith,S. Yang, M.B. Rosson, Supporting end-user debugging: what dousers want to know?, in: Proceedings of the Working Conference onAdvanced Visual Interfaces, ACM, 2006, pp. 135–142.

[21] A.J. Ko, B.A. Myers, Designing the whyline: a debugging interface forasking questions about program behavior, in: Proceedings of theSIGCHI Conference on Human Factors in Computing Systems, ACM,2004, pp. 151–158.

[22] A.J. Ko, B.A. Myers, Finding causes of program output with the JavaWhyline, in: Proceedings of the 27th International Conference onHuman Factors in Computing Systems, ACM, 2009, pp. 1569–1578.

[23] A.J. Ko, B.A. Myers, M.J. Coblenz, H.H. Aung, An exploratory study ofhow developers seek, relate, and collect relevant informationduring software maintenance tasks, IEEE Transactions on SoftwareEngineering 32 (2006) 971–987.

[24] A.J. Ko, B. Uttl, Individual differences in program comprehensionstrategies in unfamiliar programming systems, in: Proceedings ofthe International Conference on Program Comprehension, IEEEComputer Society, 2003, p. 175.

[25] Koders Open Source Code Search Engine /http://koders.com/S.[26] Krugle, /http://www.krugle.com/S.[27] J. Lawrance, R. Bellamy, M. Burnett, Scents in Programs:Does

Information Foraging Theory Apply to Program Maintenance?,Visual Languages—Human Centric Computing, IEEE ComputerSociety, 2007, pp. 15–22.

[28] J. Lawrance, R. Bellamy, M. BurnettK. Rector, , Using informationscent to model the dynamic foraging behavior of programmers inmaintenance tasks, in: Proceeding of the 26th Annual SIGCHIConference on Human Factors in Computing Systems, ACM, 2008,pp. 1323–1332.

[29] J. Lawrance, M. Burnett, R. Bellamy, C. Bogart, C. Swart, Reactiveinformation foraging for evolving goals, in: Proceedings of the 28thInternational Conference on Human Factors in Computing Systems,ACM, 2010, pp. 25–34.

[30] G. Leshed, E.M. Haber, T. Matthews, T. Lau, CoScripter: automating& sharing how-to knowledge in the enterprise, in: Proceeding of the

26th Annual SIGCHI Conference on Human Factors in ComputingSystems, ACM, 2008, pp. 1719–1728.

[31] S. Letovsky, Cognitive processes in program comprehension, in:Proceedings of the First Workshop on Empirical Studies ofProgrammers on Empirical Studies of Programmers, Ablex Publish-ing Corp., 1986, pp. 58–79.

[32] H. Lieberman, Mondrian: A Teachable Graphical Editor, Watch What IDo: Programming by Demonstration, MIT Press, 1993, pp. 341–358.

[33] D.C. Littman, J. Pinto, S. Letovsky, E. Soloway, Mental models andsoftware maintenance, in: Proceedings of the First Workshopon Empirical Studies of Programmers on Empirical Studies ofProgrammers, Ablex Publishing Corp., 1986, pp. 80–98.

[34] J. Maloney, L. Burd, Y. Kafai, N. Rusk, B. Silverman, M. Resnick,Creating, connecting, and collaborating through computing, in:Proceedings of the Second International Conference on Scratch: ASneak Preview [education], 2004, pp. 104–109.

[35] A.V. Mayrhauser, A.M. Vans, Hypothesis-driven understandingprocesses during corrective maintenance of large scale software,in: Proceedings of the IEEE International Conference on SoftwareMaintenance, IEEE Computer Society, 1997, p. 12.

[36] R. McCauley, S. Fitzgerald, G. Lewandowski, L. Murphy, B. Simon, L.Thomas, C. Zander, Debugging: a review of the literature from aneducational perspective, Computer Science Education vol. 18(2008) 67–92.

[37] A. Monroy-Hernandez, M. Resnick, Empowering kids to create andshare programmable media, Interactions 15 (2008) 50–53.

[38] R. Mosemann, S. Wiedenbeck, Navigation and comprehension ofprograms by novice programmers, in: Proceedings of the Interna-tional Conference on Program Comprehension, IEEE ComputerSociety, 2001, p. 0079.

[39] B.A. Myers, R. McDaniel, D. Wolber, Programming by example:intelligence in demonstrational interfaces, Communications of theACM 43 (2000) 82–89.

[40] M. Nanja, C.R. Cook, An Analysis of the On-line Debugging Process,Empirical Studies of Programmers: Second Workshop, AblexPublishing Corp., 1987, pp. 172–184.

[41] N. Pennington, Stimulus structures and mental representations inexpert comprehension of computer programs, Cognitive Psychol-ogy 19 (1987) 295–341.

[42] P. Pirolli, S. Card, Information foraging, Psychological Review 106(1999) 643–675.

[43] M.P. Robillard, W. Coelho, G.C. Murphy, How effective developersinvestigate source code: an exploratory study, IEEE Transactions onSoftware Engineering 30 (2004) 889–903.

[44] M.B. Rosson, J. Ballin, J. Rode, Who, What, and How: A Surveyof Informal and Professional Web Developers, Visual Languages—

Human Centric Computing, IEEE Computer Society, 2005, pp. 199–206.[45] J.R. Ruthruff, S. Prabhakararao, J. Reichwein, C. Cook, E. Creswick, M.

Burnett, Interactive, visual fault localization support for end-userprogrammers, Journal of Visual Languages & Computing 16 (1-2)(2005) 3–40.

[46] C. Scaffidi, M. Shaw, B. Myers, Estimating the Numbers of End Usersand End User Programmers, Visual Languages—Human CentricComputing, IEEE Computer Society, 2005, pp. 207-214.

[47] Study Programs /http://www.cse.wustl.edu/�grosspa/jvlc/S.[48] N. Subrahmaniyan, L. Beckwith, V. Grigoreanu, M. Burnett, S.

Wiedenbeck, V. Narayanan, K. Bucht, R. Drummond, X. Fern, Testingvs. code inspection vs. what else?: male and female end users’debugging strategies, in: Proceeding of the 26th Annual SIGCHIConference on Human Factors in Computing Systems, ACM, 2008,pp. 617–626.

[49] Userscripts.org: Power-ups for your browser /http://userscripts.org/S.[50] S. Wiedenbeck, Beacons in computer program comprehension, Int.

J. Man-Mach. Stud. 25 (1986) 697–709.[51] S. Wiedenbeck, The initial stage of program comprehension,

International Journal of Man-Machine Studies 35 (1991) 517–540.[52] S. Wiedenbeck, A. Engebretson, Comprehension Strategies of End-User

Programmers in an Event-Driven Application, Visual Languages—

Human Centric Computing, IEEE Computer Society, 2004, pp. 207-214.[53] J. Wong, J.I. Hong, Making mashups with marmite: towards end-

user programming for the web, in: Proceedings of the SIGCHIConference on Human Factors in Computing Systems, ACM, 2007,pp. 1435–1444.