14
TightLip: Keeping Applications from Spilling the Beans Aydan R. Yumerefendi, Benjamin Mickle, and Landon P. Cox {aydan, bam11, lpcox}@cs.duke.edu Duke University, Durham, NC Abstract Access control misconfigurations are widespread and can re- sult in damaging breaches of confidentiality. This paper presents TightLip, a privacy management system that helps users define what data is sensitive and who is trusted to see it rather than forcing them to understand or predict how the interactions of their software packages can leak data. The key mechanism used by TightLip to detect and prevent breaches is the doppelganger process. Doppelgangers are sandboxed copy processes that inherit most, but not all, of the state of an original process. The operating system runs a dop- pelganger and its original in parallel and uses divergent process outputs to detect potential privacy leaks. Support for doppelgangers is compatible with legacy-code, requires minor modifications to existing operating systems, and imposes negligible overhead for common workloads. SpecWeb99 results show that Apache running on a TightLip prototype exhibits a 5% slowdown in request rate and response time compared to an unmodified server environment. 1 Introduction Email, the web, and peer-to-peer file sharing have cre- ated countless opportunities for users to exchange data with each other. However, managing the permissions of the shared spaces that these applications create is challenging, even for highly skilled system administra- tors [15]. For untrained PC users, access control er- rors are routine and can lead to damaging privacy leaks. A 2003 usability study of the Kazaa peer-to-peer file- sharing network found that many users share their en- tire hard drive with the rest of the Internet, including email inboxes and credit card information [12]. Over 12 hours, the study found 156 distinct users who were sharing their email inboxes. Not only were these files available for download, but other users could be ob- served downloading them. Examples of similar leaks abound [16, 17, 22, 31]. Secure communication channels [3, 9] and intrusion detection systems [7, 11] would not have prevented these exposures. Furthermore, the impact of these leaks extends beyond the negligent users themselves since leaked sensitive data is often previous communication and transaction records involving others. No matter how careful any individual is, her privacy will only be as se- cure as her least competent confidant. Prior approaches to similar problems are either incompatible with legacy code [10, 14, 19, 26], rely on expensive binary rewriting and emulation [5, 20], or require changes to the under- lying architecture [8, 28, 29]. We are exploring new approaches to preventing leaks due to access control misconfigurations through a pri- vacy management system called TightLip. TightLip’s goal is to allow organizations and users to better man- age their shared spaces by helping them define what data is important and who is trusted, rather than re- quiring an understanding of the complex dynamics of how data flows among software components. Realizing this goal requires addressing three challenges: 1) cre- ating file and host meta-data to identify sensitive files and trusted hosts, 2) tracking the propagation of sen- sitive data through a system and identifying potential leaks, and 3) developing policies for dealing with po- tential leaks. This paper focuses a new operating system object we have developed to deal with the second chal- lenge: doppelganger processes. Doppelgangers are sandboxed copy processes that in- herit most, but not all, of the state of an original pro- cess. In TightLip, doppelgangers are spawned when a process tries to read sensitive data. The kernel returns sensitive data to the original and scrubbed data to the doppelganger. The doppelganger and original then run in parallel while the operating system monitors the se- quence and arguments of their system calls. As long as the outputs for both processes are the same, then the original’s output does not depend on the sensitive input with very high probability. However, if the operating system detects divergent outputs, then the original’s out- put is likely descended from the sensitive input. A breach arises when such an output is destined for an endpoint that falls outside of TightLip’s control, such as a socket connected to an untrusted host. When po- tential breaches are detected, TightLip invokes a policy module, which can direct the operating system to fail the output, ignore the alert, or even swap in the doppel- ganger for the original. Using doppelgangers to infer the sensitivity of processes’ outputs is attractive because it requires only minor changes to existing operating sys- tems and no modifications to the underlying architecture or legacy applications. We have added support for doppelgangers to the Linux kernel and currently support their use of the file system, UNIX domain sockets, pipes, network sock-

TightLip: Keeping Applications from Spilling the Beans

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TightLip: Keeping Applications from Spilling the Beans

TightLip: Keeping Applications from Spilling the BeansAydan R. Yumerefendi, Benjamin Mickle, and Landon P. Cox

{aydan, bam11, lpcox}@cs.duke.eduDuke University, Durham, NC

AbstractAccess control misconfigurations are widespread and can re-sult in damaging breaches of confidentiality. This paperpresents TightLip, a privacy management system that helpsusers define what data is sensitive and who is trusted to seeit rather than forcing them to understand or predict how theinteractions of their software packages can leak data.The key mechanism used by TightLip to detect and preventbreaches is the doppelganger process. Doppelgangers aresandboxed copy processes that inherit most, but not all, of thestate of an original process. The operating system runs a dop-pelganger and its original in parallel and uses divergent processoutputs to detect potential privacy leaks.Support for doppelgangers is compatible with legacy-code,requires minor modifications to existing operating systems,and imposes negligible overhead for common workloads.SpecWeb99 results show that Apache running on a TightLipprototype exhibits a 5% slowdown in request rate and responsetime compared to an unmodified server environment.

1 IntroductionEmail, the web, and peer-to-peer file sharing have cre-ated countless opportunities for users to exchange datawith each other. However, managing the permissionsof the shared spaces that these applications create ischallenging, even for highly skilled system administra-tors [15]. For untrained PC users, access control er-rors are routine and can lead to damaging privacy leaks.A 2003 usability study of the Kazaa peer-to-peer file-sharing network found that many users share their en-tire hard drive with the rest of the Internet, includingemail inboxes and credit card information [12]. Over12 hours, the study found 156 distinct users who weresharing their email inboxes. Not only were these filesavailable for download, but other users could be ob-served downloading them. Examples of similar leaksabound [16, 17, 22, 31].

Secure communication channels [3, 9] and intrusiondetection systems [7, 11] would not have preventedthese exposures. Furthermore, the impact of these leaksextends beyond the negligent users themselves sinceleaked sensitive data is often previous communicationand transaction records involving others. No matter howcareful any individual is, her privacy will only be as se-cure as her least competent confidant. Prior approachesto similar problems are either incompatible with legacycode [10, 14, 19, 26], rely on expensive binary rewriting

and emulation [5, 20], or require changes to the under-lying architecture [8, 28, 29].

We are exploring new approaches to preventing leaksdue to access control misconfigurations through a pri-vacy management system called TightLip. TightLip’sgoal is to allow organizations and users to better man-age their shared spaces by helping them define whatdata is important and who is trusted, rather than re-quiring an understanding of the complex dynamics ofhow data flows among software components. Realizingthis goal requires addressing three challenges: 1) cre-ating file and host meta-data to identify sensitive filesand trusted hosts, 2) tracking the propagation of sen-sitive data through a system and identifying potentialleaks, and 3) developing policies for dealing with po-tential leaks. This paper focuses a new operating systemobject we have developed to deal with the second chal-lenge: doppelganger processes.

Doppelgangers are sandboxed copy processes that in-herit most, but not all, of the state of an original pro-cess. In TightLip, doppelgangers are spawned when aprocess tries to read sensitive data. The kernel returnssensitive data to the original and scrubbed data to thedoppelganger. The doppelganger and original then runin parallel while the operating system monitors the se-quence and arguments of their system calls. As longas the outputs for both processes are the same, then theoriginal’s output does not depend on the sensitive inputwith very high probability. However, if the operatingsystem detects divergent outputs, then the original’s out-put is likely descended from the sensitive input.

A breach arises when such an output is destined foran endpoint that falls outside of TightLip’s control, suchas a socket connected to an untrusted host. When po-tential breaches are detected, TightLip invokes a policymodule, which can direct the operating system to failthe output, ignore the alert, or even swap in the doppel-ganger for the original. Using doppelgangers to inferthe sensitivity of processes’ outputs is attractive becauseit requires only minor changes to existing operating sys-tems and no modifications to the underlying architectureor legacy applications.

We have added support for doppelgangers to theLinux kernel and currently support their use of the filesystem, UNIX domain sockets, pipes, network sock-

Page 2: TightLip: Keeping Applications from Spilling the Beans

ets, and GUIs. Early experience with this prototypehas shown that doppelgangers are useful for an impor-tant subset of applications: servers which read files,encode the files’ content, and then write the result-ing data to the network. Micro-benchmarks of sev-eral common file transfer applications as well as theSpecWeb99 benchmark demonstrate that doppelgangersimpose negligible performance overhead under moder-ate server workloads. For example, SpecWeb99 resultsshow that Apache running on TightLip exhibits only a5% slowdown in request rate and response time com-pared to an unmodified server environment.

2 OverviewAccess control misconfigurations are common and po-tentially damaging: peer-to-peer users often inadver-tently share emails and credit card information [12],computer science department faculty have been foundto set the permissions of their email files to all-readable [31], professors have inadvertently left stu-dents’ grade information in their public web space [22],a database of 20,000 Hong Kong police complainants’personal information was accidentally published on theweb and ended up in Google’s cache [16], and UK em-ployees unintentionally copied sensitive internal doc-uments to remote Google servers via Google Desk-top [17]. Because these breaches were not the resultof buggy or malicious software, they present a differentthreat model than is normally assumed by the privacyand security literature.

TightLip addresses this problem in three phases: 1)help users identify sensitive files, 2) track the propaga-tion of sensitivity through a running system and detectwhen sensitive data may leave the system, and 3) enablepolicies for handling potential breaches. The focus ofthis paper is on the mechanisms used in phase two, butthe rest of this section provides an overview of all three.

2.1 Identifying Sensitive FilesTo identify sensitive data, TightLip periodically scanseach file in a file system and applies a series of diag-nostics, each corresponding to a different sensitive datatype. These diagnostics use heuristics about a file’s path,name, and content to infer whether or not it is of a par-ticular sensitive type. For example, the email diagnos-tic checks for a “.pst” file extension, placement below a“mail” directory, and the ASCII string “Message-ID” inthe file.

This scanning process is similar to anti-virus softwarethat uses a periodically updated library of definitions toscan for infected files. The difference is that rather thanprompting users when they find a positive match, diag-nostics silently mark the file as sensitive and invoke thetype’s associated scrubber.

Scrubbers use a file’s content to produce a non-sensitive shadow version of the file. For example, theemail scrubber outputs a properly formatted shadowemail file of the same size as the input file, but marksout each message’s sender, recipient, subject, and bodyfields. Attachments are handled by recursively invok-ing other format-preserving, MIME-specific scrubbers.When the system cannot determine a data source’s typeit reverts to the default scrubber, which replaces eachcharacter from the sensitive data source with the “x”character.

2.2 Sensitivity Tracking and Breach DetectionOnce files have been labeled, TightLip must track howsensitive information propagates through executing pro-cesses and prevent it from being copied to an untrusteddestination. This problem is an instance of information-flow tracking, which has most commonly been used toprotect systems from malicious exploits such as bufferoverflows and format string attacks. Unfortunately,these solutions either suffer from incompatibly withlegacy applications [10, 14, 19, 26, 32], require expen-sive binary rewriting [5, 6, 8, 20, 28], or rely on hardwaresupport [29].

Instead, TightLip offers a new point in the designspace of information-flow secure systems based on dop-pelganger processes. Doppelgangers are sandboxedcopy processes that inherit most, but not all, of the stateof an original process. Figure 1 shows a simple exam-ple of how doppelgangers can be used to track sensi-tive information. Initially, an original process runs with-out touching sensitive data. At some point, the origi-nal attempts to read a sensitive file, which prompts theTightLip kernel to spawn a doppelganger. The kernel re-turns the sensitive file’s content to the original and thescrubbed content of the shadow file to the doppelganger.

Once the reads have been satisfied, the original anddoppelganger are both placed on the CPU ready queueand, when scheduled, modify their private memory ob-jects. The operating system subsequently tracks sensi-tivity at the granularity of a system call. If the doppel-ganger and original generate the same system call se-quence with the same arguments, then these outputs donot depend on either the sensitive or scrubbed input withhigh probability and the operating system does nothing.This might happen when an application such as a virusscanner handles sensitive files, but does not act on theircontent.

However, if the doppelganger and original make thesame system call with different arguments, then the orig-inal’s output likely depends on sensitive data and the ob-jects the call modifies are marked as sensitive. As longas updated objects are within the operating system’s con-trol, such as files and pipes, then they can be transitively

Page 3: TightLip: Keeping Applications from Spilling the Beans

Original

1. Original process reads a non-sensitive file.

Original Doppelganger

2. Original tries to read a sensitive file. Create copy doppelganger.

Original Doppelganger

3. Return sensitive data to original, scrubbed data to doppelganger.

Doppelganger

6. Swap doppelganger for original, allow network write.

Original Doppelganger

5. Original, doppelganger try to write different buffers to network.

Original Doppelganger

4. Original, doppelganger run in parallel, taint other objects.

Figure 1: Using doppelgangers to avoid a breach.

labeled. However, if the system call modifies an objectthat is outside the control of the system, such as a socketconnected to an untrusted host, then allowing the origi-nal’s system call may compromise confidentiality.

By tracking information-flow at a relatively coursegranularity, TightLip avoids many of the drawbacks ofprevious approaches. First, because TightLip does notdepend on any language-level mechanisms, it is com-patible with legacy applications. Second, comparing thesequence and arguments of system calls does not requirehardware support and needs only minor changes to ex-isting operating systems. Third, the performance penaltyof introducing doppelgangers is modest; the overhead ofscheduling an additional process is negligible for mostworkloads.

Finally, an important limitation of existinginformation-flow tracking solutions is that theycannot gracefully transition a process from a tainted(i.e., having accessed sensitive data) and to an untaintedstate. A list of tainted memory locations or variables isnot enough to infer what a clean alternative would looklike. Bridging this semantic gap requires understandinga process’s execution logic and data structure invariants.Because of this, once a breach has been detected, priorsolutions require all tainted processes associated withthe breach to be rebooted. While rebooting purges

taint, it also wipes out untainted connections and datastructures.

Doppelgangers provide TightLip with an internallyconsistent, clean alternative to the tainted process; aslong as shadow files are generated properly, doppel-gangers will not contain any sensitive information. Thisallows TightLip to swap doppelgangers in for their origi-nal processes—preserving continuous execution withoutcompromising confidentiality.

2.3 Disclosure PoliciesOnce the operating system detects that sensitive datais about to be copied onto a destination outside ofTightLip’s control, it invokes the disclosure policy mod-ule. Module policies specify how the kernel should han-dle attempts to copy sensitive data to untrusted destina-tions. Our current prototype supports actions such asdisabling a process’s write permissions, terminating theprocess, scrubbing output buffers, or swapping the dop-pelganger in for the original.

TightLip provides default policies, but also notifiesusers of potential breaches so that they can define theirown policies. Query answers can be delivered syn-chronously or asynchronously (e.g. via pop-up windowsor emails). Answers can also be cached to minimize fu-ture interactions with the user.

Page 4: TightLip: Keeping Applications from Spilling the Beans

3 LimitationsThough TightLip is attractive for its low overhead, com-patibility with legacy applications and hardware, andsupport for continuous execution, it is not without itslimitations. First, in TightLip the operating system iscompletely trusted. TightLip is helpless to stop the ker-nel from maliciously or unintentionally compromisingconfidentiality. For example, TightLip cannot preventan in-kernel NFS server from leaking sensitive data.

Second, TightLip relies on scrubbers to produce validdata. An incorrectly formatted shadow file could crashthe doppelganger. In addition, swapping in a doppel-ganger is only safe if scrubbers can remove all sensitiveinformation. While feasible for many data types, it maynot be possible to meet these requirements for all datasources.

Third, scrubbed data can lead to false negatives insome pathological cases. For example, an original pro-cess may accept a network query asking whether a sen-sitive variable is even or odd. TightLip could generate ascrubbed value that is different from the sensitive vari-able, but of the same parity. The output for the doppel-ganger and original would be the same, despite the factthat the original is leaking information. The problem isthat it is possible to generate “unlucky” scrubbed datathat can lead to a false negative. Such false negativesare unlikely to arise in practice since the probability ofa collision decreases geometrically with the number ofbits required to encode a response.

Fourth, TightLip avoids the overhead of previous ap-proaches by focusing on system calls, rather than indi-vidual memory locations. Unfortunately, if a processreads sensitive data from multiple sources, TightLip can-not compute the exact provenance of a sensitive output.While this loss of information makes more fine-grainedconfidentiality policies impossible, it allows us to pro-vide practical application performance.

Fifth, TightLip does not address the problem of covertchannels. An application can use a variety of covertchannels to transmit sensitive information [24, 29].Since it is unlikely that systems can close all possiblecovert channels [29], dealing with covert channels is be-yond the scope of this paper.

Finally, TightLip relies on comparisons of processoutputs to track sensitivity and transitively label objects.If the doppelganger generates a different system callthan its original, it has entered a different execution stateand may no longer provide information about the rela-tionship between the sensitive input and the original’soutput. Such divergence might happen if scrubbed in-put induced a different control flow in the doppelganger.Without the doppelganger as a point of comparison, anyobject subsequently modified by the original must bemarked sensitive; this can lead to incorrectly labeled ob-

jects and false positives.This limitation of doppelgangers is similar to those

faced by taint-flow analysis of “implicit flow.” Con-sider the following code fragment, in which variable x istainted: if(x) { y=1; } else { y=0; }. Vari-able y should be flagged since its value depends on thevalue of x. Tainting each variable written inside a condi-tional block captures all dependencies, but can also im-plicate innocent variables and raise false positives. Inpractice, following dependencies across conditionals isextremely difficult without carefully-placed programmerannotations [32]. Every taint-checker for legacy codethat we are aware of ignores implicit flow to avoid falsepositives.

Despite the challenges of conditionals, for an impor-tant subset of applications, it is reasonable to assumethat scrubbed input will not affect control flow. Webservers, peer-to-peer clients, distributed file systems, andthe sharing features of Google Desktop blindly copy datainto buffers without interpreting it. Early experiencewith our prototype confirms such behavior and the restof this paper is focused on scenarios in which scrubbeddata does not affect control flow.

Much of the rest of our discussion of TightLip de-scribes how to eliminate sources of divergence betweenan original and doppelganger process so that differencesonly emerge from the initial scrubbed input. If any otherinput or interaction with the system causes a doppel-ganger to enter an alternate execution state, TightLipmay generate additional false positives.

4 DesignThere are two primary challenges in designing sup-port for doppelganger processes. First, because doppel-gangers may run for extended periods and compete withother processes for CPU time and physical memory, theymust be as resource-efficient as possible. Second, sinceTightLip relies on divergence to detect breaches, all dop-pelganger inputs and outputs must be carefully regulatedto minimize false positives.

4.1 Reducing Doppelganger OverheadOur first challenge was limiting the resources consumedby doppelgangers. A doppelganger can be spawned atany point in the original’s execution. One option is tocreate the doppelganger concurrently with the original,but doing so would incur the cost of monitoring in thecommon case when taint is absent.

Instead, TightLip only creates a doppelganger whena process attempts to read from a sensitive file. For thevast majority of processes, reading sensitive files will oc-cur rarely, if ever. However, some long-lived processesthat frequently handle sensitive data such as virus scan-ners and file search tools may require a doppelganger

Page 5: TightLip: Keeping Applications from Spilling the Beans

Type Example Processing descriptionKernel update bind Apply original, return result to both.Kernel read getuid Verify identical system call sequences.

Non-kernel update send Synchronize, compare buffers.Non-kernel read gettimeofday Buffer original results, return to both.

Table 1: Doppelganger-kernel interactions.

throughout their execution. For these applications, it isimportant that doppelgangers be as resource-efficient aspossible.

Once created, doppelgangers are inserted into thesame CPU ready queue as other processes. This im-poses a modest scheduling overhead and adds processorload. However, unlike taint-checkers, the fact that dop-pelgangers have a separate execution context enables adegree of parallelization with other processes, includingthe original. Though we assume a uni-processor envi-ronment throughout this paper, TightLip should be ableto take advantage of emerging multi-core architectures.

To limit memory consumption, doppelgangers areforked from the original with their memory markedcopy-on-write. In addition, nearly all of the doppel-ganger’s kernel-maintained process state is shared read-only with the original, including its file object table andassociated file descriptor namespace. The only sepa-rate, writable objects maintained for the doppelgangerare its execution context, file offsets, and modified mem-ory pages.

4.2 Doppelganger Inputs and OutputsIn TightLip the kernel must manage doppelganger in-puts and outputs to perform three functions: prevent ex-ternal effects, limit the sources of divergence to the ini-tial scrubbed input, and contain sensitive data. To per-form these functions, the kernel must regulate informa-tion that passes between the doppelganger and kernelthrough system calls, signals, and thread schedules.

Kernel-doppelganger interactions fall into one of thefollowing categories: kernel updates, kernel reads, non-kernel updates, and non-kernel reads. Table 1 lists eachtype, provides an example system call, and briefly de-scribes how TightLip regulates the interaction.

4.2.1 Updates to Kernel StateAs with speculative execution environments [4, 21],TightLip must prevent doppelgangers from producingany external effects so that it remains unintrusive. Aslong as an application does not try to leak sensitive infor-mation, it should behave no differently than in the casewhen there is no doppelganger.

This is why original processes must share their ker-nel state with the doppelganger read-only. If the doppel-ganger were allowed to update the original’s objects, it

could alter its execution. Thus, system calls that modifythe shared kernel state must be strictly ordered so thatonly the original process can apply updates.

System calls that update kernel state include, but arenot limited to, exit, fork, time, lseek, alarm, sigaction,gettimeofday, settimeofday, select, poll, llseek, fcntl,bind, connect, listen, accept, shutdown, and setsockopt.

TightLip uses barriers and condition variables to im-plement these system calls. A barrier is placed at the en-try of each kernel modifying call. After both processeshave entered, TightLip checks their call arguments toverify that they are the same. If the arguments match,then the original process executes the update, while thedoppelganger waits. Once the original finishes, TightLipnotifies the doppelganger of the result before allowing itto continue executing.

If the processes generate different updates and themodified objects are under the kernel’s control, TightLipapplies the original’s update and records a transfer ofsensitivity. For example, the kernel transitively marksas sensitive objects such as pipes, UNIX domain sock-ets, and files. Subsequent reads of these objects by otherprocesses may spawn doppelgangers.

It is important to note that processes will never blockindefinitely. If one process times out waiting for theother to reach the barrier, TightLip assumes that the pro-cesses have diverged and discards the doppelganger. Thekernel will then have to either mark any subsequentlymodified objects sensitive or invoke the policy module.

Signals are a special kernel-doppelganger interactionsince they involve two phases: signal handler registra-tion, which modifies kernel data, and signal delivery,which injects data into the process. Handler registra-tion is managed using barriers and condition variablesas other kernel state updates are; only requests from theoriginal are actually registered. However, whenever sig-nals are delivered, both processes must receive the samesignals in the same order at the same points in their exe-cution. We discuss signal delivery in Section 4.2.2.

Of course, doppelgangers must also be preventedfrom modifying non-kernel state such as writing to filesor network sockets. Because it may not be possible toproceed with these writes without invoking a disclosurepolicy and potentially involving the user, modificationsof non-kernel state are treated differently. We discussupdates to non-kernel state in Section 4.2.3.

Page 6: TightLip: Keeping Applications from Spilling the Beans

4.2.2 Doppelganger InputsTo reduce the false positive rate TightLip must ensurethat sources of divergence are limited to the scrubbedinput. For example, both processes must receive thesame values for time-of-day requests, receive the samenetwork data, and experience the same signals. Ensur-ing that reads from kernel state are the same is triv-ial, given that updates are synchronized. However, pre-venting non-kernel reads, signal delivery, and thread-interleavings from generating divergence is more chal-lenging.

Non-kernel readsThe values returned by non-kernel reads, such as froma file, a network socket, and the processor’s clock, canchange over time. For example, consecutive calls to get-timeofday or consecutive reads from a socket will eachreturn different data. TightLip must ensure that pairedaccesses to non-kernel state return the same value to boththe original and doppelganger. This requirement is sim-ilar to the Environment Instruction Assumption ensuredby hypervisor-based fault-tolerance [2].

To prevent the original from getting one input andthe doppelganger another, TightLip assigns a producer-consumer buffer to each data source. For each buffer, theoriginal process is the producer and the doppelganger isthe consumer. System calls that use such queues includeread, readv, recv, recvfrom, and gettimeofday.

If the original (producer) makes a read request first, itis satisfied by the external source and the result is copiedinto the buffer. If the buffer is full, the original mustblock until the doppelganger (consumer) performs a readfrom the same non-kernel source and consumes the sameamount of data from the buffer. Similarly, if the doppel-ganger attempts to read from a non-kernel source and thebuffer is empty, it must wait for the original to add data.

The mechanism is altered slightly if the read is fromanother sensitive source. In this case, the kernel re-turns scrubbed buffers to the doppelganger and updatesa list of sensitive inputs to the process. Otherwise, theproducer-consumer queue is handled exactly the sameas for a non-sensitive source. As before, neither processwill block indefinitely.

SignalsIn Section 4.2.1, we explained that signals are a two-phase interaction: a process registers a handler and thekernel may later deliver a signal. We treat the first phaseas a kernel update. Since modifications to kernel stateare synchronized, any signal handler that the originalsuccessfully registers is also registered for the doppel-ganger.

The TightLip kernel delivers signals to a process asit transitions into user mode. Any signals intended for

a process are added to its signal queue and then movedfrom the queue to the process’s stack as it exits kernelspace. A process can exit kernel space either becauseit has finished a system call or because it had been pre-empted and is scheduled to start executing again.

To prevent divergence, any signals delivered to thedoppelganger and original must have the same content,be delivered in the same order, and must be delivered tothe same point in their execution. If any of these condi-tions are violated, the processes could stray. TightLipensures that signal content and order is identical bycopying any signal intended for the original to both theoriginal’s and doppelganger’s signal queue.

Before jumping back into user space, the kernel placespending signals on the first process’s stack. Conceptu-ally, when the process re-enters user space, it handlesthe signals in order before returning from its system call.The same is true when the second process (whether thedoppelganger or original) re-enters user space. For thesecond process, a further check is needed to ensure thatonly signals that were delivered to the first are deliveredto the second.

In previous sections, we have described how the orig-inal and doppelganger must be synchronized when en-tering system call code in the kernel so that TightLipcan detect divergence. Unfortunately, simply synchro-nizing the entry to system calls between processes is in-sufficient to ensure that signals are delivered to the sameexecution state.

This is because some system calls can be interruptedby a signal arriving while the kernel is blocked waitingfor an external event to complete the call. In such cases,the kernel delivers the signal to the process and returnsan “interrupted” error code (e.g. EINTR in Linux). In-terrupting the system call allows the kernel to deliversignals without waiting (potentially forever) for the ex-ternal event to occur.

Properly written user code that receives an interruptederror code will retry the system call. If TightLip onlysynchronizes on system call entry-points, retrying an in-terrupted system call can lead to different system callsequences. Consider the following example taken fromthe execution of the SSH daemon, sshd, where Process1 and 2 could be either the doppelganger or original:• Process 1 (P1) calls write and waits for Process 2

(P2).• P2 calls write, wakes up P1, completes write, re-

turns to user-mode, calls select, and waits for P1 tocall select.

• P1 wakes up and begins to complete write.• A signal arrives for the original process.• The kernel puts the signal handler on P1’s stack and

sets the return value of P1’s write to EINTR.

Page 7: TightLip: Keeping Applications from Spilling the Beans

Figure 2: Signaling that leads to divergence.

• P1 handles the signal, sees a return code of EINTRfor write, retries write, and waits for P2 to callwrite.

In this example, divergence arose because P1’s andP2’s calls to write generated different return values,which led P1 to call write twice. To prevent this,TightLip must ensure that paired system calls generatethe same return values. Thus, system call exit-pointsmust be synchronized as well as entry-points. In ourexample, paired exit-points prevent P2’s write from re-turning a different return value than P1’s: both P1 andP2 are returned either EINTR or the number of writtenbytes.

Parallel control flows as well as lock-step system callentry and exit points make it likely that signals will bedelivered to the same point in processes’ execution, butthey are still not a guarantee. To see why, consider theprocesses in Figure 2. In the example, a user-level threadlibrary uses an alarm signal to pre-empt an application’sthreads. When the signal is handled determines howmuch progress the user-level thread makes. In this case,it determines the order in which threads acquire a lock.The problem is that the doppelganger and original havebeen pre-empted at different instructions, which forcesthem to handle the same signal in different states. Ide-ally, the processor would provide a recovery register,which can be decremented each time an instruction isretired; the processor then generates an interrupt once itbecomes negative. Unfortunately, the x86 architecturedoes not support such a register.

Even without a recovery register, TightLip can stilllimit the likelihood of divergence by deferring signal de-livery until the processes reach a synchronization point.Most programs make system calls throughout their exe-cution, providing many opportunities to handle signals.However, for the rare program that does not make anysystem calls, the kernel cannot wait indefinitely with-out compromising program correctness. Thus, the ker-nel can defer delivering signals on pre-emption re-entryonly a finite number of times. In our limited experience

with our prototype kernel, we have not seen process di-vergence due to signal delivery.

ThreadsManaging multi-threaded processes requires two addi-tional mechanisms. First, the kernel must pair dop-pelganger and original threads entering and exiting thekernel. Second, the kernel must ensure that synchro-nization resources are acquired in the same order forboth processes. Assuming parallel control flows, if con-trol is transferred between threads along system callsand thread primitives such as lock/unlock pairs, thenTightLip can guarantee that the original and doppel-ganger threads will enter and exit the kernel at the samepoints.

4.2.3 Updates to Non-kernel StateThe last process interactions to be regulated are up-dates to non-kernel state. As with other system calls,these updates are synchronized between the processesusing barriers and condition variables. The differencebetween these modifications and those to kernel state isthat TightLip does not automatically apply the original’supdate and return the result to both processes. TightLip’sbehavior depends on whether the original and the dop-pelganger have generated the same updates.

Handling Potential LeaksIf both processes generate the same update, thenTightLip assumes that the update does not depend on thesensitive input and that releasing it will not compromiseconfidentiality. The kernel applies the update, returnsthe result, and takes no further action.

If the updates differ and are to an object outside of thekernel’s control, TightLip assumes that a breach is aboutto occur and queries the disclosure policy module. Ourprototype currently supports several disclosure policies:do nothing (allow the potentially sensitive data to pass),disable writes to the network (the system call returns anerror), send the doppelganger output instead of the orig-inal’s, terminate the process, and swap the doppelgangerfor the original process.

SwappingIf the user chooses to swap in the doppelganger, the ker-nel sets the original’s child processes’ parent to the dop-pelganger, discards the original, and associates the orig-inal’s process identifier with the doppelganger’s processstate. While the swap is in-progress, both processes mustbe removed from the CPU ready queue. This allows re-lated helper processes to make more progress than theymight have otherwise, which can affect the execution ofthe swapped-in process in subtle but not incorrect ways.We will describe an example of such behavior in Sec-tion 6.1.

Page 8: TightLip: Keeping Applications from Spilling the Beans

Swapped-in processes require an extra mechanism torun the doppelganger efficiently and safely. For eachswapped-in process, TightLip maintains a fixed-size listof open files inherited from the doppelganger. Anytimethe swapped-in process attempts to read from a sensi-tive file, the kernel checks whether the file is on the list.If it is, TightLip knows that the process had previouslyreceived scrubbed data from the file and returns morescrubbed data. If the file is not on the list and the file issensitive, TightLip spawns a new doppelganger.

These lists are an optimization to avoid spawning dop-pelgangers unnecessarily. Particularly for large files thatrequire multiple reads, spawning a new doppelganger forevery sensitive read can lead to poor performance. Im-portantly, leaving files off of the list can only hurt perfor-mance and will never affect correctness or compromiseconfidentiality. Because of this guarantee, TightLip canremove any write restrictions on the swapped-in processsince its internal state is guaranteed to be untainted.

Unfortunately, swapping is not without risk. In somecases, writing the doppelganger’s buffer to the networkand keeping the doppelganger around to monitor theoriginal may be the best option. For example, the usermay want the original to write sensitive data to a localfile even if it should not write it to the network. How-ever, maintaining both processes incurs some overheadand non-sensitive writes would still be identical for boththe original and the swapped-in process with very highprobability.

Furthermore, the doppelganger can stray from theoriginal in unpredictable ways. This is similar to the un-certainty generated by failure-oblivious computing [23].To reduce this risk, TightLip can monitor internal diver-gence in addition to external divergence. External symp-toms of straying are obvious—when the doppelgangergenerates different sequences of system calls or uses dif-ferent arguments. Less obvious may be if the scrubbeddata or some other input silently shifts the process’s con-trol flow. Straying of this form may not generate exter-nal symptoms, but can still leave the doppelganger in adifferent execution state than the original.

We believe that this kind of divergence will be rare forapplications such as file servers, web servers, and peer-to-peer clients; these processes will read a sensitive file,encode its contents, and write the result to the network.Afterward, the doppelganger and original will return tothe same state after the network write. In other cases,divergence will likely manifest itself as a different se-quence of system calls or a crash [18].

For additional safety, TightLip can take advantage ofcommon processor performance counters, such as thoseoffered by the Pentium4 [27] to detect internal diver-gence. If the number of instructions, number of branchestaken and mix of loads and stores are sufficiently simi-

lar, then it is unlikely that the scrubbed input affected thedoppelganger’s control flow. TightLip can use these val-ues to measure the likelihood that the doppelganger andoriginal are in the same execution state and relay thisinformation to the user.

4.3 Example: Secure Copy (scp)To demonstrate the design of the TightLip kernel, it isuseful to step through an example of copying a sensitivefile from a TightLip-enabled remote host via the securecopy utility, scp.

Secure copy requests are accepted by an SSH dae-mon, sshd, running on the remote host. After authen-ticating the requester, sshd forks a child process, shell,which runs under the uid of the authenticated user andwill transfer encrypted file data directly to the requestervia a network socket, nsock. shell creates a child processof its own, worker, which reads the requested data fromthe file system and writes it to a UNIX domain socket,dsock, connecting shell and worker.

As soon as worker attempts to read a sensitive file,the kernel spawns a doppelganger, D(worker). Onceworker and D(worker) have returned from their respec-tive reads, they both try to write to dsock. Since dsock

is under the kernel’s control, the actual file data fromworker is buffered and dsock is transitively marked sen-sitive. shell, meanwhile, selects on dsock and is wokenup when there is data available for reading.

When shell attempts to read from dsock (which isnow sensitive), the kernel forks another doppelganger,D(shell), and returns the actual buffer content (sensitivefile data) to shell and scrubbed data to D(shell). shell

and D(shell) both encrypt their data and attempt to writethe result to nsock. Since their output buffers are dif-ferent, the breach is detected. By default, the kernelwrites D(shell)’s encrypted scrubbed data to nsock, setsthe parent process of worker and D(worker) to D(shell),and swaps in D(shell) for shell.

4.4 Future WorkThough TightLip supports most interactions betweendoppelgangers and the operating system, there is stillsome work to be done. For example, we currently do notsupport communication over shared memory. TightLipcould interpose on individual loads and stores to sharedmemory by setting the page permissions to read-only.Though this prevents sensitive data from passing freely,it also generate a page fault on every access of the sharedpages.

In addition, TightLip currently lacks a mechanism toprevent a misconfigured process from overwriting sen-sitive data. Our design targets data confidentiality, butdoes not address data integrity. However, it is easy toimagine integrating integrity checks with our current de-

Page 9: TightLip: Keeping Applications from Spilling the Beans

sign. For example, anytime a process attempts to writeto a sensitive file, TightLip could invoke the policy mod-ule, as it currently does for network socket writes.

Finally, we believe that it will be possible to reducethe memory consumed by a long-lived doppelganger byperiodically comparing its memory pages to the origi-nal’s. This would make using the doppelganger solely togenerate untainted network traffic—as opposed to swap-ping it in for the original—more attractive.

Though doppelgangers will copy-on-write memorypages as they execute, many of those pages may still beidentical to the original’s. This would be true for pagesthat only receive updates that are independent of thescrubbed input. These pages could be remarked copy-on-write and shared anew by the two processes.

Furthermore, even if a page initially contained bytesthat depended on the scrubbed input, over time thosebytes may be overwritten with non-sensitive values.These pages could also be recovered. Carried to its log-ical conclusion, if all memory pages of the original anddoppelganger converged, then the doppelganger couldbe discarded altogether. We may be able to apply thememory consolidation techniques used in the VMwarehypervisor [30] to this problem and intend to explorethese mechanisms and others in our future work.

5 ImplementationOur TightLip prototype consists of several hundred linesof C code scattered throughout the Linux 2.6.13 kernel.We currently support signals, inter-process communica-tion via pipes, UNIX domain sockets, and graphical userinterfaces. Most of the code deals with monitoring dop-pelganger execution, but we also made minor modifica-tions to the ext3 file system to store persistent sensitiv-ity labels.

5.1 File SystemsSensitivity is currently represented as a single bit co-located on-disk with file objects. If more complex clas-sifications become necessary, using one bit could be ex-tended to multiple bits. To query sensitivity, we added apredicate to the kernel file object that returns the sensi-tivity status of any file, socket, and pipe. TightLip cur-rently only supports sensitivity in the ext3 file system,though this implementation is backwards-compatiblewith existing ext3 partitions. Adding sensitivity to fu-ture file systems should be straightforward since manip-ulating the sensitivity bit in on-disk ext3 inodes onlyrequired an extra three lines of code.

Our prototype also provides a new privileged systemcall to manage sensitivity from user-space. The systemcall can be used to read, set, or clear the sensitivity of agiven file. This is used by TightLip diagnostics and by autility for setting sensitivity by hand.

5.2 Data StructuresOur prototype augments several existing Linux datastructures and adds one new one, called a completionstructure. Completion structures buffer the results of aninvoked kernel function. This allows TightLip to applyan update or receive a value from a non-kernel sourceonce, but pass on the result to both the original and dop-pelganger. Minimally, completion structures consist ofarguments to a function and its return value. They mayalso contain instructions for the receiving process, suchas a divergence notification or instructions to terminate.

TightLip also required several modifications to theLinux task structure. These additions allow the kernelto map doppelgangers to and from originals, synchro-nize their actions, and pass messages between them. Thetask structure of the original process also stores a list ofbuffers corresponding to kernel function calls such asbind, accept, and read. Finally, all process structurescontain a list of at most 10 open sensitive files fromwhich scrubbed data should be returned. Once a sen-sitive file is closed, it is removed from this list.

5.3 System CallsSystem call entry and exit barriers are crucial for de-tecting and preventing divergence. For example, cor-rectly implementing the exit system call requires thatpeers synchronize in the kernel to atomically removeany mutual dependencies between them. We have in-serted barriers in almost all implemented system calls.In the future, we may be able to relax these constraintsand eliminate some unnecessary barriers.

We began implementing TightLip by modifying readsystem calls for files and network sockets. Next, wemodified the write system call to compare the outputs ofthe original and the doppelganger. The prototype allowsinvocation of a custom policy module when TightLipdetermines that a process is attempting to write sensi-tive data. Supported policies include allowing the sen-sitive data to be written, killing the process, closing thefile/socket, writing the output of the doppelganger, andswapping the doppelganger for the original process.

After read and write calls, we added support for readsand modifications of kernel state, including all of thesocket system calls. We have instrumented most, but notall relevant system calls. Linux currently offers morethan 290 system calls, of which we have modified 28.

5.4 Process SwappingTightLip implements process swapping in severalstages. First, it synchronizes the processes using a bar-rier. Then the original process notifies the doppelgangerthat swapping should take place. The doppelganger re-ceives the message and exchanges its process identifierwith the original’s. To do this requires unregistering

Page 10: TightLip: Keeping Applications from Spilling the Beans

both processes from the global process table and thenre-registering them under the exchanged identifiers. Thedoppelganger must then purge any pointers to the origi-nal process’s task structure.

Once the doppelganger has finished cleaning up, itacknowledges the original’s notification. After receiv-ing this acknowledgment, the original removes any of itsstate that depends on the doppelganger and sets its par-ent to the init process. This avoids a child death signalfrom being delivered to its actual parent. The originalalso re-parents all of its children to the swapped-in dop-pelganger. Once these updates are in place, the originalsafely exits.

5.5 Future Implementation WorkThere are still several features of our design that remainunimplemented. The major goal of the current prototypehas been to evaluate our design by running several keyapplications such as a web server, NFS server, and sshdserver. We are currently working on support for multi-threaded applications. Our focus on single-threadedapplications, pipes, UNIX domain sockets, files, andnetwork sockets has given us valuable experience withmany of the core mechanisms of TightLip and we lookforward to a complete environment in the very near fu-ture.

6 EvaluationIn this section we describe an evaluation of our TightLipprototype using a set of data transfer micro-benchmarksand SpecWeb99. Our goal was to examine how TightLipaffects data transfer time, resource requirements, and ap-plication saturation throughput.

We used several unmodified server applications:Apache-1.3.34, NFS server 2.2beta47-20, and sshd-3.8.Each of these applications is structured differently, lead-ing to unique interactions with the kernel. Apache runsas a collective of worker processes that are created ondemand and destroyed when idle for a given period. TheNFS server is a single-threaded, event-driven processthat uses signals to handle concurrent requests.

sshd forks a shell process to represent the user re-questing a connection. The shell process serves datatransfer requests by forking a worker process to fetchfiles from the file system. The worker sends the data tothe shell process using a UNIX domain socket, and theshell process encrypts the data and sends it over the net-work to the client. All sshd forked processes belongingto the same session are destroyed when the client closesthe connection.

All experiments ran on a Dell Precision 8300 work-station with a single 3.0 GHz Pentium IV processor and1GB RAM. We ran all client applications on an identi-cal machine connected to the TightLip host via a closed

100Mbs LAN. All graphs report averages together witha 95% confidence interval obtained from 10 runs of eachexperiment. It should be noted that we did not detectany divergence prior to the network write for any appli-cations during our experiments.

6.1 Application Micro-benchmarksIn this set of experiments we examined TightLip’s im-pact on several data transfer applications. We chosethese applications because they are typical of thoselikely to inadvertently leak data, as exemplified by themotivating Kazaa, web server, and distributed file sys-tem misconfigurations [12, 16, 22, 31]. Our methodol-ogy was simple; each experiment consisted of a singleclient making 100 consecutive requests for 100 differ-ent files, all of the same size. As soon as one requestfinished, the client immediately made another.

For each trial, we examined four TightLip configura-tions. To capture baseline performance, each server ini-tially ran with no sensitive files. The server simply readfrom the file system, encoded the files’ contents, and re-turned the results over the network.

Next, we ran the servers with all files marked sensitiveand applied three more policies. The continuous policycreated a doppelganger for each process that read sensi-tive data and ran the doppelganger alongside the originaluntil the original exited. Subsequent requests to the orig-inal process were also processed by the doppelganger.

The swap policy followed the continuous policy, butswapped in the doppelganger for the original after eachnetwork write. If the swapped-in process accessed sen-sitive data again, a new doppelganger was created andswapped in after the next write.

The optimized swap policy remembered if a processhad been swapped in. This allowed TightLip to avoidcreating doppelgangers when the swapped process at-tempted to further read from the same sensitive source;the system could return scrubbed data without creating anew doppelganger.

Figure 3, Figure 4, and Figure 5 show the relativetransfer times for the above applications when clientsfetched sensitive files of varying sizes.

Note that the cost of the additional context switchesTightLip requires to synchronize the original and dop-pelganger may be high relative to the baseline transfertime for smaller files. This phenomenon is most no-ticeable for the NFS server in Figure 4, where fetchingfiles of size 1K and 4K was 30% and 25% more ex-pensive, respectively, than fetching non-sensitive files.As file size increases, data transfer began to dominatethe context switch overhead induced by TightLip; theNFS server running under all policies transferred 256KBwithin 10% of the baseline time.

Page 11: TightLip: Keeping Applications from Spilling the Beans

Figure 3: Apache relative transfer time.

Figure 4: NFS relative transfer time.

Figure 3 shows that our Apache web server was theleast affected by the TightLip. The overhead under allthree policies was within 5% of the baseline, with con-tinuous execution being slightly more expensive than theother two. This result can be explained by the fact thatfetching static files from the web server was I/O boundand required little CPU time. Continuous execution wasslightly more expensive, since the original and the dop-pelganger both parsed every client request.

Figure 5 shows that the overhead of using doppel-gangers for sshd was within 10% of the baseline for mostcases. This was initially surprising, since the originaland doppelganger performed encryption on the outputconcurrently. However, the overhead of performing ex-tra symmetric encryption was low and masked by themore dominant cost of I/O.

The swap policy performed better than the continuousexecution policy for Apache and NFS. This result wasexpected since process swapping reduces the overheadof running a doppelganger. The benefit from processswapping was application-dependent though, as the timespent swapping the doppelganger for the original some-times outweighed the overhead incurred by running thedoppelganger—while swapping took place, the processwas blocked and could not make any progress. Transfer-ring 4K size files from sshd illustrated this point: sshdwas almost done transferring all of its data after the firstwrite to the network. Swapping the doppelganger for the

Figure 5: SSH relative transfer time.

Server Continuous Swap Optimizedapache 852 76634 5389

sshd 76277 166055 38085nfsd 58 233017 42395

Table 2: Average total number of additional pages cre-ated during a run of the data transfer micro-benchmarks.Each run transfers 600 files for a total of 133MB.

original only delayed completion of the request.To our surprise, the swapping policy applied to sshd

actually reduced transfer times for 16K and 64K files.The reason for this behavior was that during swapping,the sshd shell process blocked and could not consumedata from the UNIX domain socket. However, theworker process continued to feed data to the socket,which increased the amount of data the shell processfound on its next read.

Since the shell process had a larger read buffer thanthe worker process, swapping caused the shell processto perform larger reads and, as a result, fewer networkwrites relative to not swapping. Performing fewer sys-tem calls improved the transfer time observed by theclient. The impact of swapping decreased as file size in-creased since the fixed-size buffer of the UNIX domainsocket forced worker processes to block if the socket wasfull.

The optimized swap policy had the best overall per-formance among all three policies. Since all servers per-form repeated reads from the same sensitive source, cre-ating doppelgangers after every read was unnecessary.Even though this policy often improved performance, itdid not apply in all cases. The policy assumed that sen-sitive writes depended on all sensitive sources that a pro-cess had opened. Thus, future reads from these sensitivesources always produced scrubbed data.

Doppelgangers affected memory usage as well as re-sponse time. Table 2 shows the average total numberof extra memory pages allocated while running the en-tire benchmark. Each cell represents the additional num-ber of pages created during the transfer of all 600 files(133MB).

Page 12: TightLip: Keeping Applications from Spilling the Beans

Figure 6: SpecWeb99 throughput.

We observed that the server applications behave dif-ferently under our policies. The best memory policyfor Apache and nfsd was continuous execution since forboth servers’ process executes until the end of the bench-mark. For these two servers any other policy increasedthe number of doppelgangers created and required morepage allocations. Since an sshd process only executesfor the duration of a single file transfer, continuous ex-ecution was not as good as swap-optimized execution.For all three servers, the swap policy produced the mostpage allocations, since it created more doppelgangers.

Overall, our micro-benchmark results suggest thatTightLip has low impact on data transfer applications.The overhead depends on the policy used to deal withsensitive writes. In most cases the overhead was within5%, and it never exceeded 30%. Even with doppel-gangers running continuously, TightLip outperformedprior taint-checking approaches by many orders of mag-nitude. For example, Apache running under TaintCheckand serving 10KB files is nearly 15 times slower thanan unmodified server. For 1KB files, it is 25 timesslower [20]. Thus, even in the worst case, using dop-pelgangers provides a significant performance improve-ment for data transfer applications.

6.2 Web Server PerformanceOur final set of experiments used the SpecWeb99benchmark on an Apache web server running on aTightLip machine. We used two configurations for theseexperiments—no sensitive files and continuous execu-tion with all files marked sensitive. Since the bench-mark verified the integrity of every file, we configuredTightLip to return the data supplied by the original in-stead of the scrubbed data supplied by the doppelganger.This modification was only for test purposes, so that wecould run the benchmark over our kernel. Even withthis modification it was impossible to use SpecWeb99on Apache with process swapping, since we could notcompletely eliminate the effect of data scrubbing; theswapped-in doppelgangers still had some scrubbed datain their buffers.

Figure 7: SpecWeb99 response time.

We configured SpecWeb99 to request static content ofvarying sizes. Figure 6 shows the server throughput asa function of the number of clients, and Figure 7 showsthe response time. Our results show that the overheadof handling sensitive files was within 5%. The abovegraphs show that the saturation point for both configura-tions was in the range of 110–130 clients. These resultsfurther demonstrate that doppelgangers can provide pri-vacy protection at negligible performance cost.

7 Related WorkSeveral recent system designs have observed the troublethat users and organizations have managing their sensi-tive data [3, 25, 29]. RIFLE [29] and InfoShield [25]both propose new hardware support for information-flow analysis and enforcement; SANE [3] enforces capa-bilities in-network. All of these approaches are orthog-onal to TightLip. An interesting direction for our futurework will be to design interfaces for exporting sensitiv-ity between these layers and TightLip.

A simple way to prevent leaks of sensitive data is torevoke the network write permissions of any process thatreads a sensitive file. The problem is that this policy canneedlessly punish processes that use the network legiti-mately after reading sensitive data. For example, virusscanners often read sensitive files and later contact aserver for new anti-virus definitions while Google Desk-top and other file indexing tools may aggregate local andremote search results.

A number of systems perform information-flow anal-ysis to transitively label memory objects by restrict-ing or modifying application source code. Static so-lutions compute information-flow at compile time andforce programmers to use new programming languagesor annotation schemes [19, 26]. Dynamic solutions relyon programming tools or new operating system abstrac-tions [10, 14, 32]. Unlike TightLip, both approaches re-quire modifying or completely rewriting applications.

It is also possible to track sensitivity without access tosource code by moving information flow functionalityinto hardware [6, 8, 28, 29]. The main drawback of this

Page 13: TightLip: Keeping Applications from Spilling the Beans

work is the lack of such support in commodity machines.An alternative to hardware-level tracking is software em-ulation through binary rewriting [5, 7, 20]. The maindrawback of this approach is poor performance. Becausethese systems must interpose on each memory access,applications can run orders of magnitude more slowly.In comparison, TightLip’s use of doppelgangers runson today’s commodity hardware and introduces modestoverhead.

A recent taint checker built into the Xen hypervi-sor [13] can avoid emulation overhead as long as thereare no tainted resident memory pages. The hypervisortracks taint at a hardware byte granularity and can dy-namically switch a virtual machine to emulation modefrom virtualized mode once it requires tainted memoryto execute. This allows untainted systems to run at nor-mal virtual machine speeds.

While promising, tracking taint at a hardware bytegranularity has its own drawbacks. In particular, it forcesguest kernels to run in emulation mode whenever theyhandle tainted kernel memory. The system designershave modified a Linux guest OS to prevent taint from in-advertently infecting the kernel stack, but this does notaddress taint spread through system calls. For example,if email files were marked sensitive, the system wouldremain in emulation mode as long as a user’s email re-mained in the kernel’s buffer cache. This would im-pose a significant global performance penalty, harmingtainted and untainted processes alike. Furthermore, thetainted data could remain in the buffer cache long afterthe tainted process that placed it there had exited.

TightLip’s need to limit the sources of divergenceafter scrubbed data has been delivered to the doppel-ganger is similar to the state synchronization problemsof primary/backup fault tolerance [1]. In the seminalprimary/backup paper, Alsberg describes a distributedsystem in which multiple processes run in parallel andmust be kept consistent. The primary process answersclient requests, but any of the backup processes canbe swapped in if the primary fails or to balance loadacross replicas. Later, Bressoud and Schneider ap-plied this model to a hypervisor running multiple vir-tual machines [2]. The main difference between dop-pelgangers and primary/backup fault tolerance is thatTightLip deliberately induces a different state and thentries to eliminate any future sources of divergence. Inprimary/backup fault tolerance, the goal is to eliminateall sources of divergence.

Doppelgangers also share some characteristics withspeculative execution [4, 21]. Both involve “best-effort”processes that can be thrown away if they stray. Thekey difference is that speculative processes run while theoriginal is blocked, while doppelgangers run in parallelwith the original.

8 ConclusionsAccess control configuration is tedious and error-prone.TightLip helps users define what data is sensitive andwho is trusted to see it rather than forcing them to un-derstand or predict how the interactions of their softwarepackages can leak data. TightLip introduces new oper-ating system objects called doppelganger processes totrack sensitivity through a system. Doppelgangers arespawned from and run in parallel with an original pro-cess that has handled sensitive data. Careful monitoringof doppelganger inputs and outputs allows TightLip toalert users of potential privacy breaches.

Evaluation of the TightLip prototype shows that theoverhead of doppelganger processes is modest. Datatransfer micro-benchmarks show an order of magnitudebetter performance than similar taint-flow analysis tech-niques. SpecWeb99 results show that Apache runningon TightLip exhibits a negligible 5% slowdown in re-quest rate and response time compared to an unmodifiedserver environment.

AcknowledgementsWe would like to thank the anonymous reviewers andour shepherd, Alex Snoeren, for their valuable insight.We would also like to thank Jason Flinn, Sam King,Brian Noble, and Niraj Tolia for their early input on thiswork.

References[1] P. A. Alsberg and J. D. Day. A Principle for Resilient

Sharing of Distributed Resources. In Proceedings of theSecond International Conference on Software Engineer-ing (ICSE), October 1976.

[2] T. C. Bressoud and F. B. Schneider. Hypervisor-BasedFault-Tolerance. ACM Transactions on Computer Sys-tems (TOCS), February 1996.

[3] M. Casado, T. Garfinkel, A. Akella, M. J. Freedman,D. Boneh, N. McKeown, and S. Shenker. SANE: A Pro-tection Architecture for Enterprise Networks. In Pro-ceedings of the 15th USENIX Security Symposium, July2006.

[4] F. Chang and G. A. Gibson. Automatic I/O Hint Gener-ation Through Speculative Execution. In Proceedings ofthe Third Symposium on Operating Systems Design andImplementation (OSDI), Feburary 1999.

[5] W. Cheng, Q. Zhao, B. Yu, and S. Hiroshige. TaintTrace:Efficient Flow Tracing with Dynamic Binary Rewriting.In Proceedings of the 11th IEEE International Sympo-sium on Computers and Communications (ISCC), June2006.

[6] J. Chow, B. Pfaff, T. Garfinkel, K. Christopher, andM. Rosenblum. Understanding Data Lifetime via WholeSystem Simulation. In Proceedings of the 13th USENIXSecurity Symposium, August 2004.

Page 14: TightLip: Keeping Applications from Spilling the Beans

[7] M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou,L. Zhang, and P. Barham. Vigilante: End-to-End Con-tainment of Internet Worms. In Proceedings of the20th ACM Symposium on Operating Systems Principles(SOSP), October 2005.

[8] J. R. Crandall and F. T. Chong. Minos: Control DataAttack Prevention Orthogonal to Memory Model. InProceedings of the 37th Annual IEEE/ACM Interna-tional Symposium on Microarchitecture (Micro), Decem-ber 2004.

[9] T. Dierks. The TLS protocol. Internet RFC 2246, January1999.

[10] P. Efstathopoulos, M. Krohn, S. VanDeBogart, C. Frey,D. Ziegler, E. Kohler, D. Mazieres, F. Kaashoek, andR. Morris. Labels and Event Processes in the AsbestosOperating System. In Proceedings of the Twentieth ACMSymposium on Operating Systems Principles, October2005.

[11] J. T. Giffin, S. Jha, and B. P. Miller. Efficient Context-sensitive Intrusion Detection. In Proceedings of theNetwork and Distributed System Security Symposium(NDSS), February 2004.

[12] N. S. Good and A. Krekelberg. Usability and Privacy:a Study of Kazaa P2P File-sharing. In Proceedings ofthe Conference On Human Factors in Computing Sys-tems (HCI), April 2003.

[13] A. Ho, M. Fetterman, C. Clark, A. Warfield, and S. Hand.Practical Taint-based Protection using Demand Emula-tion. In Proceedings of the First EuroSys Conference,April 2006.

[14] L. C. Lam and T. Chiueh. A General Dynamic Infor-mation Flow Tracking Framework for Security Applica-tions. In Proceedings of the 22nd Annual Computer Se-curity Applications Conference, December 2006.

[15] J. Leyden. ChoicePoint Fined $15m Over Data SecurityBreach. The Register, January 27, 2006.

[16] J. Leyden. HK Police Complaints Data Leak Puts Cityon Edge. The Register, March 28, 2006.

[17] A. McCue. CIO Jury: IT Bosses Ban Google DesktopOver Security Fears. silicon.com, March 2, 2006.

[18] B. P. Miller, L. Fredriksen, and B. So. An EmpiricalStudy of the Rreliability of UNIX Utilities. Communi-cations of the ACM, 33(12), 1990.

[19] A. C. Myers. JFlow: Practical Mostly-static Informa-tion Flow Control. In Proceedings of the 26th ACMSIGPLAN-SIGACT Symposium on Principles of Pro-gramming Languages (POPL), 1999.

[20] J. Newsome and D. Song. Dynamic Taint Analysis forAutomatic Detection, Analysis, and Signature Genera-tion of Exploits on Commodity Software. In Proceedingsof the Network and Distributed System Security Sympo-sium (NDSS), February 2005.

[21] E. B. Nightingale, P. M. Chen, and J. Flinn. SpeculativeExecution in a Distributed File System. In Proceedingsof the 20th ACM Symposium on Operating Systems Prin-ciples (SOSP), October 2005.

[22] A. Press. Miami University Warns Students of PrivacyBreach. Akron Beacon Journal, September 16, 2005.

[23] M. Rinard, C. Cadar, D. Dumitran, D. M. Roy, T. Leu,and W. S. Beebee. Enhancing Server Availability andSecurity Through Failure-Oblivious Computing. In Pro-ceedings of the 6th Symposium on Operating Systems De-sign and Implementation (OSDI), December 2004.

[24] A. Sabelfeld and A. C. Myers. Language-basedInformation-flow Security. Selected Areas in Communi-cations, IEEE Journal on, 21(1), January 2003.

[25] W. Shi, J. B. Fryman, G. Gu, H. H. S. Lee, Y. Zhang, andJ. Yang. InfoShield: A Security Architecture for Protect-ing Information Usage in Memory. In Proceedings ofthe 12th International Symposium on High-PerformanceComputer Architecture (HPCA), February 2006.

[26] V. Simonet. Flow Caml in a Nutshell. In Proceedings ofthe First APPSEM-II Workshop, March 2003.

[27] B. Sprunt. Pentium 4 Performance Monitoring Features.IEEE Micro, July-August 2002.

[28] G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas. Se-cure Program Execution via Dynamic Information FlowTracking. In Proceedings of the 11th International Con-ference on Architectural Support for Programming Lan-guages and Operating Systems (ASPLOS), October 2004.

[29] N. Vachharajani, M. J. Bridges, J. Chang, R. Rangan,G. Ottoni, J. A. Blome, G. A. Reis, M. Vachharajani, andD. I. August. RIFLE: An Architectural Framework forUser-Centric Information-Flow Security. In Proceedingsof the 37th Annual IEEE/ACM International Symposiumon Microarchitecture (Micro), December 2004.

[30] C. A. Waldspurger. Memory Resource Management inVMware ESX Server. In Proceedings of the 5th Sympo-sium on Operating Systems Design and Implementation(OSDI), December 2004.

[31] A. Yumerefendi, B. Mickle, and L. P. Cox. TightLip:Keeping Applications from Spilling the Beans. TechnicalReport CS-2006-7, Computer Science Department, DukeUniversity, April 2006.

[32] S. Zdancewic, L. Zheng, N. Nystrom, and A. C. My-ers. Untrusted Hosts and Confidentiality: Secure Pro-gram Partitioning. In Proceedings of the 18th ACM Sym-posium on Operating Systems Principles (SOSP), Banff,Canada, October 2001.