A Virtual Honeypot Frameworkciti.umich.edu/u/provos/papers/honeyd.pdf · A Virtual Honeypot Framework Niels Provos∗ Google, Inc. [email protected] Abstract A honeypot is a closely

A Virtual Honeypot Framework

Niels Provos∗

Google, [email protected]

Abstract

A honeypot is a closely monitored network decoyserving several purposes: it can distract adversariesfrom more valuable machines on a network, pro-vide early warning about new attack and exploita-tion trends, or allow in-depth examination of ad-versaries during and after exploitation of a honey-pot. Deploying a physical honeypot is often time in-tensive and expensive as different operating systemsrequire specialized hardware and every honeypot re-quires its own physical system. This paper presentsHoneyd, a framework for virtual honeypots that sim-ulates virtual computer systems at the network level.The simulated computer systems appear to run onunallocated network addresses. To deceive networkfingerprinting tools, Honeyd simulates the network-ing stack of different operating systems and can pro-vide arbitrary routing topologies and services for anarbitrary number of virtual systems. This paper dis-cusses Honeyd’s design and shows how the Honeydframework helps in many areas of system security,e.g. detecting and disabling worms, distracting ad-versaries, or preventing the spread of spam email.

1 Introduction

Internet security is increasing in importance asmore and more business is conducted there. Yet,despite decades of research and experience, we arestill unable to make secure computer systems or evenmeasure their security.

As a result, exploitation of newly discovered vul-nerabilities often catches us by surprise. Exploit au-tomation and massive global scanning for vulnerabil-ities enable adversaries to compromise computer sys-tems shortly after vulnerabilities become known [25].

∗This research was conducted by the author while at theCenter for Information Technology Integration of the Univer-sity of Michigan.

One way to get early warnings of new vulnerabil-ities is to install and monitor computer systems ona network that we expect to be broken into. Everyattempt to contact these systems via the network issuspect. We call such a system a honeypot. If a hon-eypot is compromised, we study the vulnerabilitythat was used to compromise it. A honeypot mayrun any operating system and any number of ser-vices. The configured services determine the vectorsan adversary may choose to compromise the system.

A physical honeypot is a real machine with itsown IP address. A virtual honeypot is a simulatedmachine with modeled behaviors, one of which is theability to respond to network traffic. Multiple vir-tual honeypots can be simulated on a single system.

Virtual honeypots are attractive because they re-quirer fewer computer systems, which reduces main-tenance costs. Using virtual honeypots, it is possibleto populate a network with hosts running numerousoperating systems. To convince adversaries that avirtual honeypot is running a given operating sys-tem, we need to simulate the TCP/IP stack of thetarget operating system carefully, in order to deceiveTCP/IP stack fingerprinting tools like Xprobe [1] orNmap [9].

This paper describes the design and implemen-tation of Honeyd, a framework for virtual honey-pots that simulates computer systems at the networklevel. Honeyd supports the IP protocol suites [26]and responds to network requests for its virtual hon-eypots according to the services that are configuredfor each virtual honeypot. When sending a responsepacket, Honeyd’s personality engine makes it matchthe network behavior of the configured operatingsystem personality.

To simulate real networks, Honeyd creates virtualnetworks that consist of arbitrary routing topologieswith configurable link characteristics such as latencyand packet loss. When networking mapping toolslike traceroute are used to probe the virtual network,they discover only the topologies simulated by Hon-eyd.

Our performance evaluation of Honeyd showsthat a 1.1 GHz Pentium III can support 30 MBit/saggregate bandwidth and that it can sustain overtwo thousand TCP transactions per second. Theexperimental evaluation of Honeyd verifies that fin-gerprinting tools are deceived by the simulated sys-tems and shows that our virtual network topologiesseem realistic to network mapping tools.

To demonstrate the power of the Honeyd frame-work, we show how it can be used in many areasof system security. For example, Honeyd can helpwith detecting and disabling worms, distracting ad-versaries, or preventing the spread of spam email.

The rest of this paper is organized as follows. Sec-tion 2 presents background information on honey-pots. In Section 3, we discuss the design and imple-mentation of Honeyd. Section 4 presents an evalua-tion of the Honeyd framework in which we analyzethe performance of Honeyd and verify that finger-printing and network mapping tools are deceived toreport the specified system configurations. We de-scribe how Honeyd can help to improve system se-curity in Section 5 and present related work in Sec-tion 6. We summarize and conclude in Section 7.

2 Honeypots

This section presents background information onhoneypots and our terminology. We provide moti-vation for their use by comparing honeypots to net-work intrusion detection systems (NIDS) [19]. Theamount of useful information provided by NIDS isdecreasing in the face of ever more sophisticated eva-sion techniques [21, 28] and an increasing number ofprotocols that employ encryption to protect networktraffic from eavesdroppers. NIDS also suffer fromhigh false positive rates that decrease their useful-ness even further. Honeypots can help with some ofthese problems.

A honeypot is a closely monitored computing re-source that we intend to be probed, attacked, orcompromised. The value of a honeypot is deter-mined by the information that we can obtain from it.Monitoring the data that enters and leaves a honey-pot lets us gather information that is not availableto NIDS. For example, we can log the key strokesof an interactive session even if encryption is usedto protect the network traffic. To detect maliciousbehavior, NIDS require signatures of known attacksand often fail to detect compromises that were un-known at the time it was deployed. On the other

hand, honeypots can detect vulnerabilities that arenot yet understood. For example, we can detectcompromise by observing network traffic leaving thehoneypot even if the means of the exploit has neverbeen seen before.

Because a honeypot has no production value, anyattempt to contact it is suspicious. Consequently,forensic analysis of data collected from honeypots isless likely to lead to false positives than data col-lected by NIDS.

Honeypots can run any operating system and anynumber of services. The configured services deter-mine the vectors available to an adversary for com-promising or probing the system. A high-interactionhoneypot simulates all aspects of an operating sys-tem. A low-interaction honeypots simulates onlysome parts, for example the network stack [24]. Ahigh-interaction honeypot can be compromised com-pletely, allowing an adversary to gain full access tothe system and use it to launch further network at-tacks. In contrast, low-interaction honeypots sim-ulate only services that cannot be exploited to getcomplete access to the honeypot. Low-interactionhoneypots are more limited, but they are useful togather information at a higher level, e.g., learn aboutnetwork probes or worm activity. They can also beused to analyze spammers or for active countermea-sures against worms; see Section 5.

We also differentiate between physical and virtualhoneypots. A physical honeypot is a real machine onthe network with its own IP address. A virtual hon-eypot is simulated by another machine that respondsto network traffic sent to the virtual honeypot.

When gathering information about network at-tacks or probes, the number of deployed honeypotsinfluences the amount and accuracy of the collecteddata. A good example is measuring the activityof HTTP based worms [23]. We can identify theseworms only after they complete a TCP handshakeand send their payload. However, most of their con-nection requests will go unanswered because theycontact randomly chosen IP addresses. A honeypotcan capture the worm payload by configuring it tofunction as a web server. The more honeypots wedeploy the more likely one of them is contacted bya worm.

Physical honeypots are often high-interaction, soallowing the system to be compromised completely,they are expensive to install and maintain. For largeaddress spaces, it is impractical or impossible to de-ploy a physical honeypot for each IP address. Inthat case, we need to deploy virtual honeypots.

Figure 1: Honeyd receives traffic for its virtual honey-pots via a router or Proxy ARP. For each honeypot,Honeyd can simulate the network stack behavior of adifferent operating system.

3 Design and Implementation

In this section, we present Honeyd, a lightweightframework for creating virtual honeypots. Theframework allows us to instrument thousands of IPaddresses with virtual machines and correspondingnetwork services. We start by discussing our designconsiderations, then describe Honeyd’s architectureand implementation.

We limit adversaries to interacting with our hon-eypots only at the network level. Instead of simulat-ing every aspect of an operating system, we chooseto simulate only its network stack. The main draw-back of this approach is that an adversary nevergains access to a complete system even if he compro-mises a simulated service. On the other hand, we arestill able to capture connection and compromise at-tempts. However, we can mitigate these drawbacksby combining Honeyd with a virtual machine likeVmware [27]. This is discussed in the related worksection. For that reason, Honeyd is a low-interactionvirtual honeypot that simulates TCP and UDP ser-vices. It also understands and responds correctly toICMP messages.

Honeyd must be able to handle virtual honeypotson multiple IP addresses simultaneously, in order topopulate the network with numerous virtual hon-eypots simulating different operating systems andservices. To increase the realism of our simulation,the framework must be able to simulate arbitrarynetwork topologies. To simulate address spaces thatare topologically dispersed and for load sharing, theframework also needs to support network tunneling.

Figure 1 shows a conceptual overview of theframework’s operation. A central machine interceptsnetwork traffic sent to the IP addresses of configuredhoneypots and simulates their responses. Before wedescribe Honeyd’s architecture, we explain how net-work packets for virtual honeypots reach the Honeydhost.

3.1 Receiving Network Data

Honeyd is designed to reply to network packetswhose destination IP address belongs to one of thesimulated honeypots. For Honeyd, to receive thecorrect packets, the network needs to be configuredappropriately. There are several ways to do this,e.g., we can create special routes for the virtual IPaddresses that point to the Honeyd host, or we canuse Proxy ARP [3], or we can use network tunnels.

Let A be the IP address of our router and B theIP address of the Honeyd host. In the simplest case,the IP addresses of virtual honeypots lie within ourlocal network. We denote them V1, . . . , Vn. When anadversary sends a packet from the Internet to hon-eypot Vi, router A receives and attempts to forwardthe packet. The router queries its routing table tofind the forwarding address for Vi. There are threepossible outcomes: the router drops the packet be-cause there is no route to Vi, router A forwards thepacket to another router, or Vi lies in local networkrange of the router and thus is directly reachable byA.

To direct traffic for Vi to B, we can use the fol-lowing two methods. The easiest way is to configurerouting entries for Vi with 1 ≤ i ≤ n that point to B.In that case, the router forwards packets for our vir-tual honeypots directly to the Honeyd host. On theother hand, if no special route has been configured,the router ARPs to determine the MAC address ofthe virtual honeypot. As there is no correspondingphysical machine, the ARP requests go unansweredand the router drops the packet after a few retries.We configure the Honeyd host to reply to ARP re-quests for Vi with its own MAC addresses. Thisis called Proxy ARP and allows the router to sendpackets for Vi to B’s MAC address.

In more complex environments, it is possible totunnel network address space to a Honeyd host. Weuse the generic routing encapsulation (GRE) [11, 12]tunneling protocol described in detail in Section 3.4.

Figure 2: This diagram gives an overview of Honeyd’sarchitecture. Incoming packets are dispatched to thecorrect protocol handler. For TCP and UDP, the con-figured services receive new data and send responsesif necessary. All outgoing packets are modified by thepersonality engine to mimic the behavior of the config-ured network stack. The routing component is optionaland used only when Honeyd simulates network topolo-gies.

3.2 Architecture

Honeyd’s architecture consists of several compo-nents: a configuration database, a central packet dis-patcher, protocol handlers, a personality engine, andan optional routing component; see Figure 2.

Incoming packets are processed by the centralpacket dispatcher. It first checks the length of anIP packet and verifies the packet’s checksum. Theframework is aware of the three major Internet pro-tocols: ICMP, TCP and UDP. Packets for other pro-tocols are logged and silently discarded.

Before it can process a packet, the dispatchermust query the configuration database to find a hon-eypot configuration that corresponds to the destina-tion IP address. If no specific configuration exists, adefault template is used. Given a configuration, thepacket and corresponding configuration is handed tothe protocol specific handler.

The ICMP protocol handler supports most ICMPrequests. By default, all honeypot configurations re-spond to echo requests and process destination un-reachable messages. The handling of other requestsdepends on the configured personalities as describedin Section 3.3.

For TCP and UDP, the framework can establish

connections to arbitrary services. Services are ex-ternal applications that receive data on stdin andsend their output to stdout. The behavior of a ser-vice depends entirely on the external application.When a connection request is received, the frame-work checks if the packet is part of an establishedconnection. In that case, any new data is sent tothe already started service application. If the packetcontains a connection request, a new process is cre-ated to run the appropriate service. Instead of cre-ating a new process for each connection, the frame-work supports subsystems and internal services. Asubsystem is an application that runs in the namespace of the virtual honeypot. The subsystem spe-cific application is started when the correspondingvirtual honeypot is instantiated. A subsystem canbind to ports, accept connections, and initiate net-work traffic. While a subsystem runs as an externalprocess, an internal service is a Python script thatexecutes within Honeyd. Internal services requireeven less resources than subsystems but can onlyaccept connections and not initiate them.

Honeyd contains a simplified TCP state machine.The three-way handshake for connection establish-ment and connection teardown via FIN or RST arefully supported, but receiver and congestion windowmanagement is not fully implemented.

UDP datagrams are passed directly to the appli-cation. When the framework receives a UDP packetfor a closed port, it sends an ICMP port unreach-able message unless this is forbidden by the config-ured personality. In sending ICMP port unreach-able messages, the framework allows network map-ping tools like traceroute to discover the simulatednetwork topology.

In addition to establishing a connection to a lo-cal service, the framework also supports redirectionof connections. The redirection may be static or itcan depend on the connection quadruple (source ad-dress, source port, destination address and destina-tion port). Redirection lets us forward a connectionrequest for a service on a virtual honeypot to a ser-vice running on a real server. For example, we canredirect DNS requests to a proper name server. Orwe can reflect connections back to an adversary, e.g.just for fun we might redirect an SSH connectionback to the originating host and cause the adver-sary to attack her own SSH server. Evil laugh.

Before a packet is sent to the network, it is pro-cessed by the personality engine. The personalityengine adjusts the packet’s content so that it appearsto originate from the network stack of the configured

Fingerprint IRIX 6.5.15m on SGI O2

TSeq(Class=TD%gcd=<104%SI=<1AE%IPID=I%TS=2HZ)

T1(DF=N%W=EF2A%ACK=S++%Flags=AS%Ops=MNWNNTNNM)

T2(Resp=Y%DF=N%W=0%ACK=S%Flags=AR%Ops=)

T3(Resp=Y%DF=N%W=EF2A%ACK=O%Flags=A%Ops=NNT)

T4(DF=N%W=0%ACK=O%Flags=R%Ops=)

T5(DF=N%W=0%ACK=S++%Flags=AR%Ops=)

T6(DF=N%W=0%ACK=O%Flags=R%Ops=)

T7(DF=N%W=0%ACK=S%Flags=AR%Ops=)

PU(Resp=N)

Figure 3: An example of an Nmap fingerprint that spec-ifies the network stack behavior of a system runningIRIX.

operating system.

3.3 Personality Engine

Adversaries commonly run fingerprinting toolslike Xprobe [1] or Nmap [9] to gather informationabout a target system. It is important that honey-pots do not stand out when fingerprinted. To makethem appear real to a probe, Honeyd simulates thenetwork stack behavior of a given operating system.We call this the personality of a virtual honeypot.Different personalities can be assigned to differentvirtual honeypots. The personality engine makes ahoneypot’s network stack behave as specified by thepersonality by introducing changes into the protocolheaders of every outgoing packet so that they matchthe characteristics of the configured operating sys-tem.

The framework uses Nmap’s fingerprint databaseas its reference for a personality’s TCP and UCPbehavior; Xprobe’s fingerprint database is used asreference for a personality’s ICMP behavior.

Next, we explain how we use the information pro-vided by Nmap’s fingerprints to change the charac-teristics of a honeypot’s network stack.

Each Nmap fingerprint has a format similar tothe example shown in Figure 3. We use the stringafter the Fingerprint token as the personality name.The lines after the name describe the results for ninedifferent tests that Nmap performs to determine theoperating system of a remote host. The first testis the most comprehensive. It determines how thenetwork stack of the remote operating system cre-ates the initial sequence number (ISN) for TCP SYNsegments. Nmap indicates the difficulty of predict-ing ISNs in the Class field. Predictable ISNs post asecurity problem because they allow an adversary to

Figure 4: The diagram shows the structure of the TCPheader. Honeyd changes options and other parametersto match the behavior of network stacks.

spoof connections [2]. The gcd and SI field providemore detailed information about the ISN distribu-tion. The first test also determines how IP identifi-cation numbers and TCP timestamps are generated.

The next seven tests determine the stack’s behav-ior for packets that arrive on open and closed TCPports. The last test analyzes the ICMP responsepacket to a closed UDP port.

The framework keeps state for each honeypot.The state includes information about ISN genera-tion, the boot time of the honeypot and the currentIP packet identification number. Keeping state isnecessary so that we can generate subsequent ISNsthat follow the distribution specified by the finger-print.

Nmap’s fingerprinting is mostly concerned withan operating system’s TCP implementation. TCPis a stateful, connection-oriented protocol that pro-vides error recovery and congestion control [20].TCP also supports additional options, not all ofwhich implemented by all systems. The size of theadvertised receiver windows varies between imple-mentations and is used by Nmap as part of the fin-gerprint.

When the framework sends a packet for a newlyestablished TCP connection, it uses the Nmap fin-gerprint to see the initial window size. After a con-nection has been established, the framework adjuststhe window size according to the amount of buffereddata.

If TCP options present in the fingerprint havebeen negotiated during connection establishment,

Figure 5: The diagram shows the structure of an ICMPport unreachable message. Honeyd introduces errorsinto the quoted IP header to match the behavior ofnetwork stacks.

then Honeyd inserts them into the response packet.The framework uses the fingerprint to determine thefrequency with which TCP timestamps are updated.For most operating systems, the update frequency is2 Hz.

Generating the correct distribution of initial se-quence numbers is tricky. Nmap obtains six ISNsamples and analyzes their consecutive differences.Nmap recognizes several ISN generation types: con-stant differences, differences that are multiples of aconstant, completely random differences, time de-pendent and random increments. To differentiatebetween the latter two cases, Nmap calculates thegreatest common divisor (gcd) and standard devia-tion for the collected differences.

The framework keeps track of the last ISN thatwas generated by each honeypot and its generationtime. For new TCP connection requests, Honeyduses a formula that approximates the distributiondescribed by the fingerprint’s gcd and standard de-viation. In this way, the generated ISNs match thegeneration class that Nmap expects for the particu-lar operating system.

For the IP header, Honeyd adjusts the generationof the identification number. It can either be zero,increment by one, or random.

For ICMP packets, the personality engine usesthe PU test entry to determine how the quotedIP header should be modified using the associatedXprobe fingerprint for further information. Someoperating systems modify the incoming packet bychanging fields from network to host order and asa result quote the IP and UDP header incorrectly.Honeyd introduces these errors if necessary. Figure 5shows an example for an ICMP destination unreach-able message. The framework also supports the gen-eration of other ICMP messages, not described heredue to space considerations.

3.4 Routing Topology

Honeyd simulates arbitrary virtual routingtopologies to deceive adversaries and network map-ping tools. This goal is different from NS-based sim-ulators [8] which try to faithfully reproduce networkbehavior in order to understand it. We simulatejust enough to deceive adversaries. When simulat-ing routing topologies, it is not possible to employProxy ARP to direct the packets to the Honeyd host.Instead, we need to configure routers to delegate net-work address space to our host.

Normally, the virtual routing topology is a treerooted where packets enter the virtual routing topol-ogy. Each interior node of the tree represents arouter and each edge a link that contains latencyand packet loss characteristics. Terminal nodes ofthe tree correspond to networks. The frameworksupports multiple entry points that can exit in par-allel. An entry router is chosen by the network spacefor which it is responsible.

To simulate an asymmetric network topology, weconsult the routing tables when a packet enters theframework and again when it leaves the framework;see Figure 2. In this case, the network topologyresembles a directed acyclic graph1.

When the framework receives a packet, it findsthe correct entry routing tree and traverses it, start-ing at the root until it finds a node that contains thedestination IP address of the packet. Packet loss andlatency of all edges on the path are accumulated todetermine if the packet is dropped and how long itsdelivery should be delayed.

The framework also decrements the time to live(TTL) field of the packet for each traversed router. Ifthe TTL reaches zero, the framework sends an ICMPtime exceeded message with the source IP address ofthe router that causes the TTL to reach zero.

For network simulations, it is possible to inte-grate real systems into the virtual routing topology.When the framework receives a packet for a real sys-tem, it traverses the topology until it finds a virtualrouter that is directly responsible for the networkspace that the real machine belongs to. The frame-work sends an ARP request if necessary to discoverthe hardware address of the system, then encapsu-lates the packet in an Ethernet frame. Similarly,the framework responds with ARP replies from thecorresponding virtual router when the real systemsends ARP requests.

1Although it is possible to configure routing loops, this isnormally undesirable and should be avoided.

route entry 10.0.0.1

route 10.0.0.1 link 10.0.0.0/24

route 10.0.0.1 add net 10.1.0.0/16 10.1.0.1 latency 55ms loss 0.1

route 10.0.0.1 add net 10.2.0.0/16 10.2.0.1 latency 20ms loss 0.1

route 10.1.0.1 link 10.1.0.0/24

route 10.2.0.1 link 10.2.0.0/24

create routerone

set routerone personality "Cisco 7206 running IOS 11.1(24)"

set routerone default tcp action reset

add routerone tcp port 23 "scripts/router-telnet.pl"

create netbsd

set netbsd personality "NetBSD 1.5.2 running on a Commodore Amiga (68040 processor)"

set netbsd default tcp action reset

add netbsd tcp port 22 proxy $ipsrc:22

add netbsd tcp port 80 "scripts/web.sh"

bind 10.0.0.1 routerone

bind 10.1.0.2 netbsd

bind 10.1.0.3 to fxp0

Figure 6: An example configuration for Honeyd. The configuration language is a context-free grammar. Thisexample creates a virtual routing topology and defines two templates: a router that can be accessed via telnetand a host that is running a web server. A real system is integrated into the virtual routing topology at IP address10.1.0.3.

We can split the routing topology using GREto tunnel networks. This allows us to load bal-ance across several Honeyd installations by delegat-ing parts of the address space to different Honeydhosts. Using GRE tunnels, it is also possible todelegate networks that belong to separate parts ofthe address space to a single Honeyd host. For thereverse route, an outgoing tunnel is selected basedboth on the source and the destination IP address.An example of such a configuration is described inSection 5.

3.5 Configuration

A virtual honeypot is configured with a template,a reference for a completely configured computer sys-tem. New templates are created with the create com-mand.

The set and add commands change the configu-ration of a template. The set command assigns apersonality from the Nmap fingerprint file to a tem-plate. The personality determines the behavior ofthe network stack, as discussed in Section 3.3. Theset command also defines the default behavior forthe supported network protocols. The default be-havior is one of the following values: block, reset,or open. Block means that all packets for the speci-fied protocol are dropped by default. Reset indicatesthat all ports are closed by default. Open means

that they are all open by default. The latter set-tings make a difference only for UDP and TCP.

We specify the services that are remotely accessi-ble with the add command. In addition to the tem-plate name, we need to specify the protocol, portand the command to execute for each service. In-stead of specifying a service, Honeyd also recognizesthe keyword proxy that allows us to forward networkconnections to a different host. The framework ex-pands the following four variables for both the ser-vice and the proxy statement: $ipsrc, $ipdst, $sport,and $dport. Variable expansion allows a service toadapt its behavior depending on the particular net-work connection it is handling. It is also possibleto redirect network probes back to the host that isdoing the probing.

The bind command assigns a template to an IPaddress. If no template is assigned to an IP address,we use the default template. Figure 6 shows an ex-ample configuration that specifies a routing topologyand two templates. The router template mimics thenetwork stack of a Cisco 7206 router and is accessibleonly via telnet. The web server template runs twoservices: a simple web server and a forwarder forSSH connections. In this case, the forwarder redi-rects SSH connections back to the connection initia-tor. A real machine is integrated into the virtualrouting topology at IP address 10.1.0.3.

$ traceroute -n 10.3.0.10

traceroute to 10.3.0.10 (10.3.0.10), 64 hops max

1 10.0.0.1 0.456 ms 0.193 ms 0.93 ms

2 10.2.0.1 46.799 ms 45.541 ms 51.401 ms

3 10.3.0.1 68.293 ms 69.848 ms 69.878 ms

4 10.3.0.10 79.876 ms 79.798 ms 79.926 ms

Figure 7: Using traceroute, we measure a routing pathin the virtual routing topology. The measured latenciesmatch the configured ones.

3.6 Logging

The Honeyd framework supports several ways oflogging network activity. It can create connectionlogs that report attempted and completed connec-tions for all protocols. More usefully, informationcan be gathered from the services themselves. Ser-vice applications can report data to be logged toHoneyd via stderr. The framework uses syslog tostore the information on the system. In most situ-ations, we expect that Honeyd runs in conjunctionwith a NIDS.

4 Evaluation

This section presents an evaluation of Honeyd’sability to create virtual network topologies and tomimic different network stacks as well as its perfor-mance.

4.1 Fingerprinting

We start Honeyd with a configuration similar tothe one shown in Figure 6 and use traceroute to findthe routing path to a virtual host. We notice thatthe measured latency is double the latency that weconfigured. This is correct because packets have totraverse each link twice.

Running Nmap 3.00 against IP addresses10.0.0.1 and 10.1.0.2 results in the correct iden-tification of the configured personalities. Nmap re-ports that 10.0.0.1 seems to be a Cisco router andthat 10.1.0.2 seems to run NetBSD. Xprobe iden-tifies 10.0.0.1 as Cisco router and lists a numberof possible operating systems, including NetBSD, for10.1.0.2.

To fully test if the framework deceives Nmap, weset up a B-class network populated with virtual hon-eypots for every fingerprint in Nmap’s fingerprint

0 20 40 60 800

0.10.20.30.40.50.60.70.80.9

1

Frac

tion

of re

turn

ed tr

affic

400 byte packets to interface400 byte packets to Honeyd400 byte packets to Honeyd (1 hop)400 byte packets to Honeyd (2 hop)

0 20 40 60 80Bandwidth in MBit

00.10.20.30.40.50.60.70.80.9

1

Frac

tion

of re

turn

ed tr

affic

800 byte packets to interface800 byte packets to Honeyd800 byte packets to Honeyd (1 hop)800 byte packets to Honeyd (2 hop)

Figure 8: The graphs shows the aggregate bandwidthsupported by Honeyd for different packet sizes and dif-ferent destination IP addresses.

file. After removing duplicates, we found 600 dis-tinct fingerprints. The honeypots were configured sothat all but one port was closed; the open port rana web server. We then launched Nmap 3.00 againstall configured IP addresses and checked which oper-ating systems Nmap identified. For 555 fingerprints,Nmap uniquely identified the operating system sim-ulated by Honeyd. For 37 fingerprints, Nmap pre-sented a list of possible choices that included thesimulated personality. Nmap failed to identify thecorrect operating system for only eight fingerprints.This might be a problem of Honeyd, or it could bedue to a badly formed fingerprint database. For ex-ample, the fingerprint for a SMC Wireless Broad-band Router is almost identical to the fingerprint fora Linksys Wireless Broadband Router. When eval-uating fingerprints, Nmap always prefers the latterover the former.

Currently available fingerprinting tools are usu-ally stateless because they neither open TCP con-nections nor explore the behavior of the TCP statemachine for states other than LISTEN or CLOSE.There are several areas like congestion control andfast recovery that are likely to be different betweenoperating systems and are not checked by finger-printing tools. An adversary who measures the dif-ferences in TCP behavior for different states acrossoperating system would notice that they do not dif-fer in Honeyd and thus be able to detect virtual hon-eypots.

Another method to detect virtual honeypots isto analyze their performance in relation to otherhosts. Sending network traffic to one virtual hon-eypot might affect the performance of other virtual

0.00 5.00×104 1.00×105 1.50×105 2.00×105 2.50×105

Number of configured honeypots

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04P

roce

ssin

g tim

e pe

r pac

ket i

n m

s

Figure 9: The graph shows the per-packet process-ing time depending on the number of virtual honey-pots. For one thousand randomly chosen destinationaddresses, the processing time is about 0.022 ms perpacket. For 250,000 destination addresses, it increasesto about 0.032 ms.

honeypots but would not affect the performance ofa real host. In the following section, we present aperformance analysis of Honeyd.

4.2 Performance

We analyze Honeyd’s performance on a 1.1 GHzPentium III over an idle 100 MBit/s network. To de-termine the aggregate bandwidth supported by Hon-eyd, we configure it to route the 10/8 network andmeasure its response rate to ICMP echo requestssent to IP addresses at different depths within a vir-tual routing topology. To get a base of comparison,we first send ICMP echo requests to the IP addressof the Honeyd host because the operating systemresponds to these requests directly. We then sendICMP echo requests to virtual IP addresses at dif-ferent depths of the virtual routing topology.

Figure 8 shows the fraction of returned ICMPecho replies for different request rates. The uppergraph shows the results for sending 400 byte ICMPecho request packets. We see that Honeyd startsdropping reply packets at a bandwidth of 30 MBit/s.For packets sent to Honeyd’s entry router, we mea-sure a 10% reply packet loss. For packets sent toIP addresses deeper in the routing topology, the lossof reply packets increases to up to 30%. The lowergraph shows the results for sending 800 byte ICMPecho request packets. Due to the larger packet size,the rate of packets is reduced by half and we seethat for any destination IP address, the packet lossis only up to 10%.

To understand how Honeyd’s performance de-pends on the number of configured honeypots, weuse a micro-benchmark that measures how the pro-cessing time per packet changes with an increasing

number of configured templates. The benchmarkchooses a random destination address from the con-figured templates and sends a TCP SYN segmentto a closed port. We measure how long it takesfor Honeyd to process the packet and generate aTCP RST segment. The measurement is repeated80,000 times. Figure 9 shows that for one thousandtemplates the processing time is about 0.022 ms perpacket which is equivalent to about 45,000 packetsper second. For 250,000 templates, the processingtime increases to 0.032 ms or about 31,000 packetsper second.

To evaluate Honeyd’s TCP end-to-end perfor-mance, we create a simple internal echo service.When a TCP connection has been established, theservice outputs a single line of status informationand then echos all the input it receives. We mea-sure how many TCP requests Honeyd can supportper second by creating TCP connections from 65536random source IP addresses in 10.1/16 to 65536random destination addresses in 10.1/16. To de-crease the client load, we developed a tool that cre-ates TCP connections without requiring state on theclient. A request is successful when the client sees itsown data packet echoed by the echo service runningunder Honeyd. A successful transaction between arandom client address Cr and a random virtual hon-eypot Hr requires the following exchange:

1. Cr → Hr: TCP SYN segment

2. Hr → Cr: TCP SYN|ACK segment

3. Cr → Hr: TCP ACK segment

4. Hr → Cr: banner payload

5. Cr → Hr: data payload

6. Cr → Hr: TCP ACK segment (banner)

7. Hr → Cr: TCP ACK segment (data)

8. Hr → Cr: echoed data payload

9. Cr → Hr: TCP RST segment

The client does not close the TCP connection viaa FIN segment as this would require state. Depend-ing on the load of the Honeyd machine, it is possiblethat the banner and echoed data payload may arrivein the same segment.

Figure 10 shows the results from our TCP per-formance measurement. We repeated our measure-ments at least five times and show the average re-sult including standard deviation. The upper graph

0 500 1000 1500 2000 2500 30000

0.10.20.30.40.50.60.70.80.9

11.1

Frac

tion

of s

ucce

ssfu

l req

uest

s

Default template for 65k hostsIndividual templates for 65k hosts

0 500 1000 1500 2000 2500 3000TCP requests per second

00.10.20.30.40.50.60.70.80.9

11.1

Frac

tion

of s

ucce

ssfu

l req

uest

s

Default template for 256 hosts (1 hop)Default template for 256 hosts (2 hops)Default template for 256 hosts (3 hops)

Figure 10: The two graphs show the number of TCPtransactions per second that Honeyd can support fordifferent configurations. The upper graph shows theperformance when using the default template for allhoneypots and when using an individual template foreach honeypot. Performance decreases slightly wheneach of the 65K honeypots is configured individually.The lower graph shows the performance for contactinghoneypots at different levels of the routing topology.Performance decreases for honeypots with higher la-tency.

shows the performance when using the default tem-plate for all honeypots compared to the performancewhen using an individual template for each honey-pot. Performance decreases slightly when each ofthe 65K honeypots is configured individually. Inboth cases, Honeyd is able to sustain over two thou-sand TCP transactions per second. The lower graphshows the performance for contacting honeypots atdifferent levels of the routing topology. The perfor-mance decreases most noticeably for honeypots thatare three hops away from the sender. We do nothave a convincing explanation for the drop in per-formance around six hundred requests per second.

Our measurements show that a 1.1 GHz Pen-tium III can simulate thousands of virtual honey-pots. However, the performance depends on thecomplexity and number of simulated services avail-able for each honeypot. The setup for studyingspammers described in Section 5.3 simulates two C-class networks on a 666 MHz Pentium III.

5 Applications

In this section, we describe how the Honeydframework can be used in different areas of system

security.

5.1 Network Decoys

The traditional role of a honeypot is that of anetwork decoy. Our framework can be used to in-strument the unallocated addresses of a productionnetwork with virtual honeypots. Adversaries thatscan the production network can potentially be con-fused and deterred by the virtual honeypots. In con-junction with a NIDS, the resulting network trafficmay help in getting early warning of attacks.

5.2 Detecting and Countering Worms

Honeypots are ideally suited to intercept trafficfrom adversaries that randomly scan the network.This is especially true for Internet worms that usesome form of random scanning for new targets [25],e.g. Blaster [5], Code Red [15], Nimda [4], Slam-mer [16], etc. In this section, we show how a vir-tual honeypot deployment can be used to detectnew worms and how to launch active countermea-sures against infected machines once a worm hasbeen identified.

To intercept probes from worms, we instrumentvirtual honeypots on unallocated network addresses.The probability of receiving a probe depends on thenumber of infected machines i, the worm propaga-tion chance and the number of deployed honeypotsh. The worm propagation chance depends on theworm propagation algorithm, the number of vulner-able hosts and the size of the address space. In gen-eral, the larger our honeypot deployment the earlierone of the honeypots receives a worm probe.

To detect new worms, we can use the Honeydframework in two different ways. We may deploya large number of virtual honeypots as gateways infront of a smaller number of high-interaction hon-eypots. Honeyd instruments the virtual honeypots.It forwards only TCP connections that have beenestablished and only UDP packets that carry a pay-load that fail to match a known fingerprint. In sucha setting, Honeyd shields the high-interaction hon-eypots from uninteresting scanning or backscatteractivity. A high-interaction honeypot like ReVirt [7]is used to detect compromises or unusual networkactivity. Using the automated NIDS signature gen-eration proposed by Kreibich et al. [14], we can thenblock the detected worm or exploit at the networkborder. The effectiveness of this approach has beenanalyzed by Moore et al. [17]. To improve it, we can

00:00:00 00:36:00 01:12:00 01:48:00 02:24:00

00.25

0.50.75

1P

opul

atio

nInfected hostsSusceptible hostsImmunized hosts

00.25

0.50.75

1

Pop

ulat

ion

00.25

0.50.75

1

Pop

ulat

ion

00:00:00 00:36:00 01:12:00 01:48:00 02:24:00Time

00.25

0.50.75

1

Pop

ulat

ion

00:00:00 00:36:00 01:12:00 01:48:00 02:24:00

00.25

0.50.75

1

Pop

ulat

ion

Infected hostsSusceptible hostsImmunized hosts

00.25

0.50.75

1

Pop

ulat

ion

00.25

0.50.75

1

Pop

ulat

ion

00:00:00 00:36:00 01:12:00 01:48:00 02:24:00Time

00.25

0.50.75

1

Pop

ulat

ion

Figure 11: The graphs show the simulated worm propagation when immunizing infected hosts that connect toa virtual honeypot. The left graph shows the propagation if the virtual honeypots are activated one hour afterthe worm starts spreading. The right graph shows the propagation if the honeypots are activated after twentyminutes. The first row in each graph shows the result when no honeypots have been deployed, the second rowshows the results for four thousand honeypots, the third for sixty five thousand honeypots and the fourth for262,000 honeypots.

configure Honeyd to replay packets to several high-interaction honeypots that run different operatingsystems and software versions.

On the other hand, we can use Honeyd’s subsys-tem support to expose regular UNIX applicationslike OpenSSH to worms. This solution is limitingas we are restricted to detecting worms only for theoperating system that is running the framework andmost worms target Microsoft Windows, not UNIX.

Moore et al. show that containing worms is notpractical on an Internet scale unless a large frac-tion of the Internet cooperates in the containmenteffort [17]. However, with the Honeyd framework, itis possible to actively counter worm propagation byimmunizing infected hosts that contact our virtualhoneypots. Analogous to Moore et al. [17], we canmodel the effect of immunization on worm propaga-tion by using the classic SIR epidemic model [13].The model states that the number of newly infectedhosts increases linearly with the product of infectedhosts, fraction of susceptible hosts and contact rate.The immunization is represented by a decrease innew infections that is linear in the number of in-fected hosts:

ds

dt= −β i(t)s(t)

di

dt= β i(t)s(t)− γ i(t)

dr

dt= γ i(t),

where at time t, i(t) is the fraction of infected hosts,s(t) the fraction of susceptible hosts and r(t) thefraction of immunized hosts. The propagation speedof the worm is characterized by the contact rate βand the immunization rate is represented by γ.

We simulate worm propagation based on the pa-rameters for a Code-Red like worm [15, 17]. Weuse 360,000 susceptible machines in a 232 addressspace and set the initial worm seed to 150 infectedmachines. Each worm launches 50 probes per sec-ond and we assume that the immunization of an aninfected machine takes one second after it has con-tacted a honeypot. The simulation measures theeffectiveness of using active immunization by vir-tual honeypots. The honeypots start working aftera time delay. The time delay represents the timethat is required to detect the worm and install theimmunization code. We expect that immunizationcode can be prepared before a vulnerability is ac-tively exploited. Figure 11 shows the worm prop-agation resulting from a varying number of instru-mented honeypots. The graph on the left shows theresults if the honeypots are brought online an hourafter the worm started spreading. The graph on theright shows the results if the honeypots can be ac-tivated within twenty minutes. If we wait for anhour, all vulnerable machines on the Internet willbe infected. Our chances are better if we start thehoneypots after twenty minutes. In that case, a de-ployment of about 262,000 honeypots is capable ofstopping the worm from spreading to all susceptible

hosts. Ideally, we detect new worms automaticallyand immunize infected machines when a new wormhas been detected.

Alternatively, it would be possible to scan the In-ternet for vulnerable systems and remotely patchthem. For ethical reasons, this is probably unfea-sible. However, if we can reliably detect an infectedmachine with our virtual honeypot framework, thenactive immunization might be an appropriate re-sponse. For the Blaster worm, this idea has beenrealized by Oudot et al. [18].

5.3 Spam Prevention

The Honeyd framework can be used to under-stand how spammers operate and to automate theidentification of new spam which can then be sub-mitted to collaborative spam filters.

In general, spammers abuse two Internet services:proxy servers [10] and open mail relays. Open prox-ies are often used to connect to other proxies or tosubmit spam email to open mail relays. Spammerscan use open proxies to anonymize their identity toprevent tracking the spam back to its origin. Anopen mail relay accepts email from any sender ad-dress to any recipient address. By sending spamemail to open mail relays, a spammer causes themail relay to deliver the spam in his stead.

To understand how spammers operate we usethe Honeyd framework to instrument networks withopen proxy servers and open mail relays. We makeuse of Honeyd’s GRE tunneling capabilities and tun-nel several C-class networks to a central Honeydhost.

We populate our network space with randomlychosen IP addresses and a random selection of ser-vices. Some virtual hosts may run an open proxyand others may just run an open mail relay or acombination of both.

When a spammer attempts to send spam emailvia an open proxy or an open mail relay, the email isautomatically redirected to a spam trap. The spamtrap then submits the collected spam to a collabo-rative spam filter.

At this writing, Honeyd has received and pro-cessed more than six million spam emails from over1, 500 different IP addresses. A detailed evaluationis the subject of future work.

Figure 12: Using the Honeyd framework, it is possibleto instrument networks to automatically capture spamand submit it to collaborative filtering systems.

6 Related Work

Cohen’s Deception Toolkit provides a frameworkto write services that seem to contain remotely ex-ploitable vulnerabilities [6]. Honeyd operates onelevel above that by providing a framework to createvirtual honeypots that can run any number of ser-vices. The Deception Toolkit could be one of theservices running on a virtual honeypot.

There are several areas of research in TCP/IPstack fingerprinting, among them: effective methodsto classify the remote operating system either by ac-tive probing or by passive analysis of network traffic,and defeating TCP/IP stack fingerprinting by nor-malizing network traffic.

Fyodor’s Nmap uses TCP and UDP probes todetermine the operating system of a host [9]. Nmapcollects the responses of a network stack to differentqueries and matches them to a signature database todetermine the operating systems of the queried host.Nmap’s fingerprint database is extensive and we useit as the reference for operating system personalitiesin Honeyd.

Instead of actively probing a remote host to deter-mine its operating systems, it is possible to identifythe remote operating system by passively analyzingits network packets. P0f [29] is one such tool. TheTCP/IP flags inspected by P0f are similar to thedata collected in Nmap’s fingerprint database.

On the other hand, Smart et al. show how to de-feat fingerprinting tools by scrubbing network pack-ets so that artifacts identifying the remote operat-ing system are removed [22]. This approach is sim-ilar to Honeyd’s personality engine as both systems

change network packets to influence fingerprintingtools. In contrast to the fingerprint scrubber that re-moves identifiable information, Honeyd changes net-work packets to contain artifacts of the configuredoperating system.

High-interaction virtual honeypots can be con-structed using User Mode Linux (UML) orVmware [27]. One example is ReVirt which can re-construct the state of the virtual machine for anypoint in time [7]. This is helpful for forensic analy-sis after the virtual machine has been compromised.Although high-interaction virtual honeypots can befully compromised, it is not easy to instrument thou-sands of high-interaction virtual machines due totheir overhead. However, the Honeyd frameworkallows us to instrument unallocated network spacewith thousands of virtual honeypots. Furthermore,we may use a combination of Honeyd and virtualmachines to get the benefit of both approaches. Inthis case, Honeyd provides network facades and se-lectively proxies connections to services to backendsprovided by high-interaction virtual machines.

7 Conclusion

Honeyd is a framework for creating virtual hon-eypots. Honeyd mimics the network stack behaviorof operating systems to deceive fingerprinting toolslike Nmap and Xprobe.

We gave an overview of Honeyd’s design and ar-chitecture and showed how Honeyd’s personality en-gine can modify packets to match the fingerprints ofother operating systems and how it is possible tocreate arbitrary virtual routing topologies.

Our performance measurements showed that asingle 1.1 GHz Pentium III can simulate thousandsof virtual honeypots with an aggregate bandwidthsof over 30 MBit/s and that it can sustain over twothousand TCP transactions per second. Our experi-mental evaluation showed that Honeyd is effective increating virtual routing topologies and successfullyfools fingerprinting tools.

We showed how the Honeyd framework can bedeployed to help in different areas of system secu-rity, e.g., worm detection, worm countermeasures,or spam prevention.

Honeyd is freely available as source code and canbe downloaded from http://www.citi.umich.edu/u/provos/honeyd/.

8 Acknowledgments

I thank Marius Eriksen, Peter Honeyman, PatrickMcDaniel and Bennet Yee for careful reviews andsuggestions. Jamie Van Randwyk, Dug Song andEric Thomas also provided helpful suggestions andcontributions.

References

[1] Ofir Arkin and Fyodor Yarochkin. Xprobe v2.0:A “Fuzzy” Approach to Remote Active OperatingSystem Fingerprinting. www.xprobe2.org, August2002.

[2] Steven M. Bellovin. Security problems in theTCP/IP protocol suite. Computer CommunicationsReview, 19:2:32–48, 1989.

[3] Smoot Carl-Mitchell and John S. Quarterman. Us-ing ARP to Implement Transparent Subnet Gate-ways. RFC 1027, October 1987.

[4] CERT. Cert advisory ca-2001-26 nimda worm. www.cert.org/advisories/CA-2001-26.html, Septem-ber 2001.

[5] CERT. Cert advisory ca-2003-20 w32/blaster worm.www.cert.org/advisories/CA-2003-20.html, Au-gust 2003.

[6] Fred Cohen. The Deception Toolkit. http://all.

net/dtk.html, March 1998. Viewed on May 12th,2004.

[7] George W. Dunlap, Samuel T. King, Sukru Cinar,Murtaza Basrai, and Peter M. Chen. ReVirt: En-abling Intrusion Analysis through Virtual-MachineLogging and Replay. In Proceedings of the 2002Symposium on Operating Systems Design and Im-plementation, December 2002.

[8] Kevin Fall. Network Emulation in the VINT/NSSimulator. In Proceedings of the fourth IEEE Sym-posium on Computers and Communications, July1999.

[9] Fyodor. Remote OS Detection via TCP/IPStack Fingerprinting. www.nmap.org/nmap/

nmap-fingerprinting-article.html, October1998.

[10] S. Glassman. A Caching Relay for the World WideWeb. In Proceedings of the First InternationalWorld Wide Web Conference, pages 69–76, May1994.

[11] S. Hanks, T. Li, D. Farinacci, and P. Traina.Generic Routing Encapsulation (GRE). RFC 1701,October 1994.

[12] S. Hanks, T. Li, D. Farinacci, and P. Traina.Generic Routing Encapsulation over IPv4 networks.RFC 1702, October 1994.

[13] Herbert W. Hethcote. The Mathematics of Infec-tious Diseases. SIAM Review, 42(4):599–653, 2000.

[14] C. Kreibich and J. Crowcroft. Automated NIDS Sig-nature Generation using Honeypots. Poster paper,ACM SIGCOMM 2003, August 2003.

[15] D. Moore, C. Shannon, and J. Brown. Code-Red:A Case Study on The Spread and Victims of an In-ternet Worm. In Proceedings of the 2nd ACM Inter-net Measurement Workshop, pages 273–284. ACMPress, November 2002.

[16] David Moore, Vern Paxson, Stefan Savage, ColleenShannon, Stuart Staniford, and Nicholas Weaver.Inside the Slammer Worm. IEEE Security and Pri-vacy, 1(4):33–39, July 2003.

[17] David Moore, Colleen Shannon, Geoffrey Voelker,and Stefan Savage. Internet Quarantine: Require-ments for Containing Self-Propagating Code. InProceedings of the 2003 IEEE Infocom Conference,April 2003.

[18] Laurent Oudot. Fighting worms with honeypots:honeyd vs msblast.exe. lists.insecure.org/

lists/honeypots/2003/Jul-Sep/0071.html, Au-gust 2003. Honeypots mailinglist.

[19] Vern Paxson. Bro: A System for Detecting NetworkIntruders in Real-Time. In Proceedings of the 7thUSENIX Security Symposium, January 1998.

[20] Jon Postel. Transmission Control Protocol. RFC793, September 1981.

[21] Thomas Ptacek and Timothy Newsham. Insertion,Evasion, and Denial of Service: Eluding Network

Intrusion Detection. Secure Networks Whitepaper,August 1998.

[22] Matthew Smart, G. Robert Malan, and Farnam Ja-hanian. Defeating TCP/IP Stack Fingerprinting.In Proceedings of the 9th USENIX Security Sympo-sium, August 2000.

[23] Dug Song, Robert Malan, and Robert Stone. ASnapshot of Global Worm Activity. Technical re-port, Arbor Networks, November 2001.

[24] Lance Spitzner. Honeypots: Tracking Hackers. Ad-dison Wesley Professional, September 2002.

[25] Stuart Staniford, Vern Paxson, and NicholasWeaver. How to 0wn the Internet in your SpareTime. In Proceedings of the 11th USENIX SecuirtySymposium, August 2002.

[26] W. R. Stevens. TCP/IP Illustrated, volume 1.Addison-Wesley, 1994.

[27] Jeremy Sugerman, Ganesh Venkitachalam, , andBeng-Hong Lim. Virtualizing I/O Devices onVMware Workstation’s Hosted Virtual MachineMonitor. In Proceedings of the Annual USENIXTechnical Conference, pages 25–30, June 2001.

[28] David Wagner and Paolo Soto. Mimicry Attacks onHost-Based Intrusion Detection Systems. In Pro-ceedings of the 9th ACM Conference on Computerand Communications Security, November 2002.

[29] Michal Zalewski and William Stearns. PassiveOS Fingerprinting Tool. www.stearns.org/p0f/

README. Viewed on January 12th, 2003.

Documents

A Virtual Honeypot Frameworkciti.umich.edu/u/provos/papers/honeyd.pdf · A Virtual Honeypot Framework Niels Provos∗ Google, Inc. [email protected] Abstract A honeypot is a closely