4
Education Editors: Matt Bishop, [email protected] Cynthia Irvine, [email protected] MAY/JUNE 2009 1540-7993/09/$25.00 © 2009 IEEE COPUBLISHED BY THE IEEE COMPUTER AND RELIABILITY SOCIETIES 53 Helping Students 0wn Their Own Code their own code to systematically and proactively discover vulner- abilities, flaws, or other weak- nesses. Unfortunately, the structure of most undergraduate curricula, graduate training, and certification courses rarely provides the time and flexibility for students to experi- ment with and nurture this skill. Similar to work at other insti- tutions, at George Mason Univer- sity, we’ve started to require that our students devise and submit at- tack scripts as part of their course- work and homework assignments. We see these scripts as one pos- sible mechanism to help students unlearn the bad habits stemming from the “it seems to compile and run” mindset, which often pre- cludes the level of engagement and experimentation necessary to develop an insight into a software system’s limits and failure modes. Encouraging a Dual Frame of Mind Simultaneously envisioning how your system could be forced to fail while you’re busy designing how it’s supposed to operate is an extremely difficult mental exercise. Writing attack scripts as a formal part of classroom and homework assign- ments gives students an opportunity to practice exactly this skill. Most undergraduate CS pro- grams—and many graduate ones— seem ill-prepared to teach students how to think about breaking their programs as they write them. This isn’t a blanket criticism of the ef- fort that already goes into teach- ing the multitude of topics that compose the CS curriculum; rather, it’s a reflection of the fact that in a young and growing dis- cipline such as computer science, the time pressure to cover these topics is extreme. Even teaching students to gracefully handle an- ticipated error conditions (such as checking function return values or writing handlers for common Java exceptions) can present a challenge when designing a course. Understandably, then, security concerns routinely receive short shrift in such an environment. Moreover, CS programs seem geared toward teaching students about the normal, average, or suc- cess cases: students learn APIs, not implementation details. 1,2 They learn integrated development en- vironments and frameworks, not the guts of runtime bindings, binary formats, and language translation. They see toy operat- ing systems and threading models written in high-level languages rather than the compromises MICHAEL E. LOCASTO George Mason University P reparing computer science students for ca- reers as software engineering professionals has long been a de facto aim of undergraduate CS programs. Nevertheless, many students rarely seem to absorb the skill set necessary to be critical enough of made to get real multitasking code running on commodity hard- ware. They learn the mechanics of algorithms that “just work” without considering input from a malicious adversary. Indeed, techniques such as randomized algorithms and Byzantine robust- ness are considered advanced top- ics for undergraduates. Nurturing the ability to under- stand the limitations of algorithms, libraries, APIs, tools, and software systems requires a significant time investment that seems difficult to carve out because it means leaving other topics uncovered. Never- theless, without explicit attention paid to allowing and encouraging students to exercise this freedom within the formal course struc- ture, we run the risk of continu- ing to produce students who lack an appreciation for software secu- rity issues. 3 Attack Scripts To help students learn to devel- op more robust, attack-tolerant code, we require them to submit as part of their homework and project assignments an executable program or script that expresses an attack plan to passively moni- tor or actively probe the solution that they’ve encoded. Although the concept still has some practical obstacles to overcome, we’re en- couraged by the experiences and creativity we’ve witnessed so far, and we believe that this pattern provides a way of structuring as- signments that the wider academic community can adopt. Note that attack scripts serve as a comple- ment to—rather than as a replace-

Helping Students 0wn Their Own Code

  • Upload
    me

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Helping Students 0wn Their Own Code

EducationEditors: Matt Bishop, [email protected]

Cynthia Irvine, [email protected]

MAY/JUNE 2009 ■ 1540-7993/09/$25.00 © 2009 IEEE ■ COPUblIshEd bY thE IEEE COMPUtEr ANd rElIAbIlItY sOCIEtIEs 53

helping students 0wn their Own Code

their own code to systematically and proactively discover vulner­abilities, flaws, or other weak­nesses. Unfortunately, the structure of most undergraduate curricula, graduate training, and certification courses rarely provides the time and flexibility for students to experi­ment with and nurture this skill.

Similar to work at other insti­tutions, at George Mason Univer­sity, we’ve started to require that our students devise and submit at­tack scripts as part of their course­work and homework assignments. We see these scripts as one pos­sible mechanism to help students unlearn the bad habits stemming from the “it seems to compile and run” mindset, which often pre­cludes the level of engagement and experimentation necessary to develop an insight into a software system’s limits and failure modes.

Encouraging a Dual Frame of MindSimultaneously envisioning how your system could be forced to fail while you’re busy designing how it’s supposed to operate is an extremely difficult mental exercise. Writing attack scripts as a formal part of classroom and homework assign­ments gives students an opportunity to practice exactly this skill.

Most undergraduate CS pro­grams—and many graduate ones—seem ill­prepared to teach students how to think about breaking their programs as they write them. This isn’t a blanket criticism of the ef­fort that already goes into teach­ing the multitude of topics that compose the CS curriculum; rather, it’s a reflection of the fact that in a young and growing dis­cipline such as computer science, the time pressure to cover these topics is extreme. Even teaching students to gracefully handle an­ticipated error conditions (such as checking function return values or writing handlers for common Java exceptions) can present a challenge when designing a course.

Understandably, then, security concerns routinely receive short shrift in such an environment. Moreover, CS programs seem geared toward teaching students about the normal, average, or suc­cess cases: students learn APIs, not implementation details.1,2 They learn integrated development en­vironments and frameworks, not the guts of runtime bindings, binary formats, and language translation. They see toy operat­ing systems and threading models written in high­level languages rather than the compromises

Michael e. locasto

George Mason UniversityP

reparing computer science students for ca­

reers as software engineering professionals has

long been a de facto aim of undergraduate CS

programs. Nevertheless, many students rarely

seem to absorb the skill set necessary to be critical enough of

made to get real multitasking code running on commodity hard­ware. They learn the mechanics of algorithms that “just work” without considering input from a malicious adversary. Indeed, techniques such as randomized algorithms and Byzantine robust­ness are considered advanced top­ics for undergraduates.

Nurturing the ability to under­stand the limitations of algorithms, libraries, APIs, tools, and software systems requires a significant time investment that seems difficult to carve out because it means leaving other topics uncovered. Never­theless, without explicit attention paid to allowing and encouraging students to exercise this freedom within the formal course struc­ture, we run the risk of continu­ing to produce students who lack an appreciation for software secu­rity issues.3

Attack ScriptsTo help students learn to devel­op more robust, attack­tolerant code, we require them to submit as part of their homework and project assignments an executable program or script that expresses an attack plan to passively moni­tor or actively probe the solution that they’ve encoded. Although the concept still has some practical obstacles to overcome, we’re en­couraged by the experiences and creativity we’ve witnessed so far, and we believe that this pattern provides a way of structuring as­signments that the wider academic community can adopt. Note that attack scripts serve as a comple­ment to—rather than as a replace­

Page 2: Helping Students 0wn Their Own Code

Education

54 IEEE sECUrItY & PrIVACY

ment for—standard exercises such as “capture the flag” competitions or the assignments in a network security lab course.4

An attack script, roughly de­fined, expresses an instance of a particular threat model that loose­ly encodes several things:

attacker intent (a statement of •goals);attacker capabilities (a set of rea­•sonable constraints on attacker knowledge, tool sets, and initial system access);nature of the target surface (in­•put points, presence of counter­measures, system details, and so on); andimpact on the target (a particular •event’s positive value with respect to the attacker’s intent and nega­tive value for the target’s owner).

Students have a great deal of free­dom to combine these elements into an active process of prob­ing and attacking their particular code. Our main intent is to refrain from artificially constraining sol­ution creativity: we hope to en­courage this type of thinking as a

habit rather than impart a specific, formulaic procedure as dogma.

When we introduce this no­tion to our students, we impress on them several things that attack scripts are not: they aren’t regres­sion tests, unit tests, or exhaus­tive enumerations of corner input cases. One of our assumptions is that the process of software attack differs drastically from most soft­ware testing activities. Exploitable vulnerabilities exist because several weaknesses are composed together in a running system. Attackers rarely care about an isolated weak­ness in one layer; rather, they value vertical knowledge about how a system’s parts work together and seek to understand how to com­pose any such weaknesses.1

Attack scripts are meant to re­flect a live process of either passive or active surveillance and a multi­step plan of attack that sets up the target program for exploit or un­desired behavior. We tell students that attack scripts should encode a sequence of steps (written main­ly as C programs or Unix shell scripts) that attempt to discover and violate the implicit assump­

tions that the target program’s de­signer made.

We can suggest several ap­proaches for constructing attack scripts:

Earlier assignments are both •less complex than later ones and give students the chance to prac­tice creating basic scripts. Later homework and class projects let them apply this basic experience and grow their skills.A good place to start building a •script is to think about how to exercise all input points to the system, including command­line arguments, environment variables, sources of pseudoran­dom numbers, files, sockets, and so on.Successful attack scripts can be •those that simply discover some information about a program’s internal structure—for example, in which order does it detect or calculate error conditions? What kinds of deductions can students make about relationships be­tween decision control points and looping structures?We advise students to build •up state within the program’s memory and data structures because this state is a prime tar­get for later extraction or for observing what effect this data might have on execution time, data structure properties, and persistent storage.Students should imagine that •the algorithm or solution they encoded is a black box result­ing from a reasonable engineer­ing effort. What features might help distinguish between several alternative solutions? Examples include determining differences between iterative or recursive versions of algorithms or dis­covering if a particular solution is optimized for specific exam­ple input.We encourage the use of de­•buggers or other program in­strumentation frameworks and

Page 3: Helping Students 0wn Their Own Code

Education

www.computer.org/security 55

remind students to balance the need to create an attack script with the need to accomplish the rest of the assignment.

We also encourage students to express their creativity both in their attack script’s goals as well as its structure—not all attack scripts need to end in a root shell. Caus­ing subtle behavior shifts, finding out the order of evaluation of cer­tain conditions, building internal state, discovering dependencies between resource allocations, or revealing details about an API’s internal workings are all valuable results that real hackers (in the nonpejorative sense) prize.

HighlightsOur students took a variety of approaches to building attack scripts. Because the first course in which we tested this concept was a systems programming fun­damentals course (with a focus on security), assignments ranged in difficulty over the semester from simple programs meant to help students gain familiarity with C to more realistic software systems such as command shells and li­brary interposition. The results for certain exercises were some­times uneven. We believe that some students had trouble simul­taneously learning C from scratch and knowing enough to target significant program behaviors, whereas other students seemed to have trouble with the exercise’s free­form nature. Despite these challenges, almost all our students identified programming errors and shortcomings that they oth­erwise wouldn’t have found, and many of them came up with cre­ative ways of manipulating their solutions. In some cases, their at­tack scripts even revealed where they had forgotten about their successful handling of an error!

Some early assignments in­cluded command­line utilities for producing a series of pseudo­

random numbers, a version of the bin­packing problem, and a natu­rally recursive calculation. Many students attempted to manipulate the command­line arguments to these programs, whereas oth­ers looked at chaining and piping input or modifying environment variables. Students also performed timing comparisons to reveal differences between execution modes, particularly to reveal dis­crepancies between iterative and recursive implementations.

For the bin­packing prob­lem, some students devised attack scripts that revealed how their pro­grams parsed configuration input, whereas others created scripts to find configuration parameters that produced suboptimal or wrong re­sults if the program didn’t use the general algorithm for completely exploring the solution space. In at­tacking the process of generating pseudorandom numbers, students found various ways to obtain the same seed value as the legitimate program in order to generate a shadow stream of exactly the same pseudorandom data.

One exercise asked students to write malloc(3) and free(3) wrap­pers that periodically recorded and leaked the data stored in pointers obtained via this interface. Stu­dents counterattacked this util­ity by writing test programs that made sure to overwrite data it was finished using; they also wrote programs that deliberately injected random values or decoy data.

Although finding and plugging

minor holes or input­handling weaknesses might seem like a small victory in terms of overall code quality, we believe this exercise’s

value—particularly for students learning the fundamentals of sys­tems programming—is in the do­ing because it gets students to shift their perspective and understand how an attacker might anticipate a critical piece of information’s value. This type of self­imposed mental duality can be quite dif­ficult to attain, especially when a solution is half­designed, soft­ware is half­written, or a student is struggling to gain familiarity with a language. We plan to interview our students to learn what strate­gies they adopted to facilitate this internal dialogue.

Stumbling BlocksThe failure to teach students how to avoid critical mistakes leading to security vulnerabilities is some­what understandable: learning to design and write good software is already a complicated task. The job isn’t made any easier with the plethora of stumbling blocks and pitfalls present in real­world pro­gramming. Software engineers must modify existing systems with extensive legacy design decisions, learn to cope with many threads of control and locking primitives, adopt different programming cus­toms and idioms, understand third­party libraries and modules, deal with abysmal comments and scant documentation, and comply with legal or regulatory design require­ments that can be difficult to trans­late to bits and bytes.

It’s no wonder that even mod­erately complex software systems

contain so many latent errors, de­spite all the object­oriented design principles, data­flow diagrams, Unified Modeling Language

The failure to teach students how to avoid critical

mistakes leading to security vulnerabilities is somewhat

understandable: learning to design and write good

software is already a complicated task.

Page 4: Helping Students 0wn Their Own Code

Education

56 IEEE sECUrItY & PrIVACY

courses, and unit test systems that we inflict on CS students and soft­ware engineering professionals over the course of their careers.

I ndeed, many students require additional (re)training to discov­

er the principles of creating highly reliable, fault­tolerant code—not to mention how to enable systems to actively defend against an adver­sary capable of adapting and learn­ing. Such training often occurs on the job, when new employees learn the skills of the craft from senior, experienced engineers. Sometimes this training comes from graduate programs or cer­tificate courses in secure software development. Some organizations might even undertake a strategic retraining effort to address vulner­abilities in legacy code in hopes of

teaching their engineers to avoid repeating those same mistakes. Unfortunately, large­scale security reviews, such as Microsoft’s 2002 effort,5 entail a large investment of resources, time, and money that smaller vendors and businesses might not have.

Although fixing ingrained hab­its seems to be a costly but neces­sary remedy, it remains unclear whether such limited efforts alone can reduce the seemingly pervasive presence of software vulnerabili­ties. We suggest that redesigning parts of the initial learning process can help rectify a major shortcom­ing by enabling students to be­come relentless self­critics about their code’s security qualities.

But besides concepts such as attack scripts, what other kinds of methods can we adopt early on in formal undergraduate and gradu­

ate education to help students not only avoid bad habits but learn to proactively vet their own code? Such skills are valuable, particu­larly when students go to work in environments without the luxury of a dedicated software tester or red team. Our hope is that when given the chance to explore and experiment as an integral part of their assignments, students will develop their own strategies and intuition about how their code might be broken even as they write it.

ReferencesS. Bratus, “Hacker Curriculum: 1. How Hackers Learn Network­ing,” IEEE Distributed Systems On-line, vol. 8, no. 10, 2007; http://ds online.computer.org/portal/pages/ dsonline/2007/10/ox002edu.html.S. Bratus, “What Hackers Learn 2. that the Rest of Us Don’t: Notes on Hacker Curriculum,” IEEE Security & Privacy, vol. 5, no. 4, 2007, pp. 72–75.G. White and G. Nordstrom, 3. “Security across the Curricu­lum: Using Computer Security to Teach Computer Science Prin­ciples,” Proc. 19th Nat’l Information Systems Security Conf., US Nat’l Inst. Standards and Tech., 1996, pp. 483–488.G. Vigna, “Teaching Network 4. Security through Live Exercises,” Proc. 3rd Ann. World Conf. Informa-tion Security Education (WISE 03), C. Irvine and H. Armstrong, eds., Kluwer Academic, 2003, pp. 3–18.S. Swoyer, “Users Enthusiastic 5. about Microsoft Security Initia­tive,” Jan. 2002; http://redmond m ag .com /news/a r t i c l e . a sp? EditorialsID=5168.

Michael E. Locasto is a research assis-

tant professor at George Mason Uni-

versity. His research interests include

intrusion detection, debugging, and au-

tomated software repair. Locasto has a

PhD in computer science from Columbia

University. Contact him at mlocasto@

gmu.edu.