Helping Students 0wn Their Own Code

EducationEditors: Matt Bishop, [email protected]

Cynthia Irvine, [email protected]

MAY/JUNE 2009 ■ 1540-7993/09/$25.00 © 2009 IEEE ■ COPUblIshEd bY thE IEEE COMPUtEr ANd rElIAbIlItY sOCIEtIEs 53

helping students 0wn their Own Code

their own code to systematically and proactively discover vulnerabilities, flaws, or other weaknesses. Unfortunately, the structure of most undergraduate curricula, graduate training, and certification courses rarely provides the time and flexibility for students to experiment with and nurture this skill.

Similar to work at other institutions, at George Mason University, we’ve started to require that our students devise and submit attack scripts as part of their coursework and homework assignments. We see these scripts as one possible mechanism to help students unlearn the bad habits stemming from the “it seems to compile and run” mindset, which often precludes the level of engagement and experimentation necessary to develop an insight into a software system’s limits and failure modes.

Encouraging a Dual Frame of MindSimultaneously envisioning how your system could be forced to fail while you’re busy designing how it’s supposed to operate is an extremely difficult mental exercise. Writing attack scripts as a formal part of classroom and homework assignments gives students an opportunity to practice exactly this skill.

Most undergraduate CS programs—and many graduate ones—seem illprepared to teach students how to think about breaking their programs as they write them. This isn’t a blanket criticism of the effort that already goes into teaching the multitude of topics that compose the CS curriculum; rather, it’s a reflection of the fact that in a young and growing discipline such as computer science, the time pressure to cover these topics is extreme. Even teaching students to gracefully handle anticipated error conditions (such as checking function return values or writing handlers for common Java exceptions) can present a challenge when designing a course.

Understandably, then, security concerns routinely receive short shrift in such an environment. Moreover, CS programs seem geared toward teaching students about the normal, average, or success cases: students learn APIs, not implementation details.1,2 They learn integrated development environments and frameworks, not the guts of runtime bindings, binary formats, and language translation. They see toy operating systems and threading models written in highlevel languages rather than the compromises

Michael e. locasto

George Mason UniversityP

reparing computer science students for ca

reers as software engineering professionals has

long been a de facto aim of undergraduate CS

programs. Nevertheless, many students rarely

seem to absorb the skill set necessary to be critical enough of

made to get real multitasking code running on commodity hardware. They learn the mechanics of algorithms that “just work” without considering input from a malicious adversary. Indeed, techniques such as randomized algorithms and Byzantine robustness are considered advanced topics for undergraduates.

Nurturing the ability to understand the limitations of algorithms, libraries, APIs, tools, and software systems requires a significant time investment that seems difficult to carve out because it means leaving other topics uncovered. Nevertheless, without explicit attention paid to allowing and encouraging students to exercise this freedom within the formal course structure, we run the risk of continuing to produce students who lack an appreciation for software security issues.3

Attack ScriptsTo help students learn to develop more robust, attacktolerant code, we require them to submit as part of their homework and project assignments an executable program or script that expresses an attack plan to passively monitor or actively probe the solution that they’ve encoded. Although the concept still has some practical obstacles to overcome, we’re encouraged by the experiences and creativity we’ve witnessed so far, and we believe that this pattern provides a way of structuring assignments that the wider academic community can adopt. Note that attack scripts serve as a complement to—rather than as a replace

Education

54 IEEE sECUrItY & PrIVACY

ment for—standard exercises such as “capture the flag” competitions or the assignments in a network security lab course.4

An attack script, roughly defined, expresses an instance of a particular threat model that loosely encodes several things:

attacker intent (a statement of •goals);attacker capabilities (a set of rea•sonable constraints on attacker knowledge, tool sets, and initial system access);nature of the target surface (in•put points, presence of countermeasures, system details, and so on); andimpact on the target (a particular •event’s positive value with respect to the attacker’s intent and negative value for the target’s owner).

Students have a great deal of freedom to combine these elements into an active process of probing and attacking their particular code. Our main intent is to refrain from artificially constraining solution creativity: we hope to encourage this type of thinking as a

habit rather than impart a specific, formulaic procedure as dogma.

When we introduce this notion to our students, we impress on them several things that attack scripts are not: they aren’t regression tests, unit tests, or exhaustive enumerations of corner input cases. One of our assumptions is that the process of software attack differs drastically from most software testing activities. Exploitable vulnerabilities exist because several weaknesses are composed together in a running system. Attackers rarely care about an isolated weakness in one layer; rather, they value vertical knowledge about how a system’s parts work together and seek to understand how to compose any such weaknesses.1

Attack scripts are meant to reflect a live process of either passive or active surveillance and a multistep plan of attack that sets up the target program for exploit or undesired behavior. We tell students that attack scripts should encode a sequence of steps (written mainly as C programs or Unix shell scripts) that attempt to discover and violate the implicit assump

tions that the target program’s designer made.

We can suggest several approaches for constructing attack scripts:

Earlier assignments are both •less complex than later ones and give students the chance to practice creating basic scripts. Later homework and class projects let them apply this basic experience and grow their skills.A good place to start building a •script is to think about how to exercise all input points to the system, including commandline arguments, environment variables, sources of pseudorandom numbers, files, sockets, and so on.Successful attack scripts can be •those that simply discover some information about a program’s internal structure—for example, in which order does it detect or calculate error conditions? What kinds of deductions can students make about relationships between decision control points and looping structures?We advise students to build •up state within the program’s memory and data structures because this state is a prime target for later extraction or for observing what effect this data might have on execution time, data structure properties, and persistent storage.Students should imagine that •the algorithm or solution they encoded is a black box resulting from a reasonable engineering effort. What features might help distinguish between several alternative solutions? Examples include determining differences between iterative or recursive versions of algorithms or discovering if a particular solution is optimized for specific example input.We encourage the use of de•buggers or other program instrumentation frameworks and

Education

www.computer.org/security 55

remind students to balance the need to create an attack script with the need to accomplish the rest of the assignment.

We also encourage students to express their creativity both in their attack script’s goals as well as its structure—not all attack scripts need to end in a root shell. Causing subtle behavior shifts, finding out the order of evaluation of certain conditions, building internal state, discovering dependencies between resource allocations, or revealing details about an API’s internal workings are all valuable results that real hackers (in the nonpejorative sense) prize.

HighlightsOur students took a variety of approaches to building attack scripts. Because the first course in which we tested this concept was a systems programming fundamentals course (with a focus on security), assignments ranged in difficulty over the semester from simple programs meant to help students gain familiarity with C to more realistic software systems such as command shells and library interposition. The results for certain exercises were sometimes uneven. We believe that some students had trouble simultaneously learning C from scratch and knowing enough to target significant program behaviors, whereas other students seemed to have trouble with the exercise’s freeform nature. Despite these challenges, almost all our students identified programming errors and shortcomings that they otherwise wouldn’t have found, and many of them came up with creative ways of manipulating their solutions. In some cases, their attack scripts even revealed where they had forgotten about their successful handling of an error!

Some early assignments included commandline utilities for producing a series of pseudo

random numbers, a version of the binpacking problem, and a naturally recursive calculation. Many students attempted to manipulate the commandline arguments to these programs, whereas others looked at chaining and piping input or modifying environment variables. Students also performed timing comparisons to reveal differences between execution modes, particularly to reveal discrepancies between iterative and recursive implementations.

For the binpacking problem, some students devised attack scripts that revealed how their programs parsed configuration input, whereas others created scripts to find configuration parameters that produced suboptimal or wrong results if the program didn’t use the general algorithm for completely exploring the solution space. In attacking the process of generating pseudorandom numbers, students found various ways to obtain the same seed value as the legitimate program in order to generate a shadow stream of exactly the same pseudorandom data.

One exercise asked students to write malloc(3) and free(3) wrappers that periodically recorded and leaked the data stored in pointers obtained via this interface. Students counterattacked this utility by writing test programs that made sure to overwrite data it was finished using; they also wrote programs that deliberately injected random values or decoy data.

Although finding and plugging

minor holes or inputhandling weaknesses might seem like a small victory in terms of overall code quality, we believe this exercise’s

value—particularly for students learning the fundamentals of systems programming—is in the doing because it gets students to shift their perspective and understand how an attacker might anticipate a critical piece of information’s value. This type of selfimposed mental duality can be quite difficult to attain, especially when a solution is halfdesigned, software is halfwritten, or a student is struggling to gain familiarity with a language. We plan to interview our students to learn what strategies they adopted to facilitate this internal dialogue.

Stumbling BlocksThe failure to teach students how to avoid critical mistakes leading to security vulnerabilities is somewhat understandable: learning to design and write good software is already a complicated task. The job isn’t made any easier with the plethora of stumbling blocks and pitfalls present in realworld programming. Software engineers must modify existing systems with extensive legacy design decisions, learn to cope with many threads of control and locking primitives, adopt different programming customs and idioms, understand thirdparty libraries and modules, deal with abysmal comments and scant documentation, and comply with legal or regulatory design requirements that can be difficult to translate to bits and bytes.

It’s no wonder that even moderately complex software systems

contain so many latent errors, despite all the objectoriented design principles, dataflow diagrams, Unified Modeling Language

The failure to teach students how to avoid critical

mistakes leading to security vulnerabilities is somewhat

understandable: learning to design and write good

software is already a complicated task.

Education

56 IEEE sECUrItY & PrIVACY

courses, and unit test systems that we inflict on CS students and software engineering professionals over the course of their careers.

I ndeed, many students require additional (re)training to discov

er the principles of creating highly reliable, faulttolerant code—not to mention how to enable systems to actively defend against an adversary capable of adapting and learning. Such training often occurs on the job, when new employees learn the skills of the craft from senior, experienced engineers. Sometimes this training comes from graduate programs or certificate courses in secure software development. Some organizations might even undertake a strategic retraining effort to address vulnerabilities in legacy code in hopes of

teaching their engineers to avoid repeating those same mistakes. Unfortunately, largescale security reviews, such as Microsoft’s 2002 effort,5 entail a large investment of resources, time, and money that smaller vendors and businesses might not have.

Although fixing ingrained habits seems to be a costly but necessary remedy, it remains unclear whether such limited efforts alone can reduce the seemingly pervasive presence of software vulnerabilities. We suggest that redesigning parts of the initial learning process can help rectify a major shortcoming by enabling students to become relentless selfcritics about their code’s security qualities.

But besides concepts such as attack scripts, what other kinds of methods can we adopt early on in formal undergraduate and gradu

ate education to help students not only avoid bad habits but learn to proactively vet their own code? Such skills are valuable, particularly when students go to work in environments without the luxury of a dedicated software tester or red team. Our hope is that when given the chance to explore and experiment as an integral part of their assignments, students will develop their own strategies and intuition about how their code might be broken even as they write it.

ReferencesS. Bratus, “Hacker Curriculum: 1. How Hackers Learn Networking,” IEEE Distributed Systems On-line, vol. 8, no. 10, 2007; http://ds online.computer.org/portal/pages/ dsonline/2007/10/ox002edu.html.S. Bratus, “What Hackers Learn 2. that the Rest of Us Don’t: Notes on Hacker Curriculum,” IEEE Security & Privacy, vol. 5, no. 4, 2007, pp. 72–75.G. White and G. Nordstrom, 3. “Security across the Curriculum: Using Computer Security to Teach Computer Science Principles,” Proc. 19th Nat’l Information Systems Security Conf., US Nat’l Inst. Standards and Tech., 1996, pp. 483–488.G. Vigna, “Teaching Network 4. Security through Live Exercises,” Proc. 3rd Ann. World Conf. Informa-tion Security Education (WISE 03), C. Irvine and H. Armstrong, eds., Kluwer Academic, 2003, pp. 3–18.S. Swoyer, “Users Enthusiastic 5. about Microsoft Security Initiative,” Jan. 2002; http://redmond m ag .com /news/a r t i c l e . a sp? EditorialsID=5168.

Michael E. Locasto is a research assis-

tant professor at George Mason Uni-

versity. His research interests include

intrusion detection, debugging, and au-

tomated software repair. Locasto has a

PhD in computer science from Columbia

University. Contact him at mlocasto@

gmu.edu.

Documents

Helping Students 0wn Their Own Code