Propagation and Containment Presented by Jing Yang, Leonid Bolotnyy, and Anthony Wood

Propagation and Containment

Presented by Jing Yang, Leonid Bolotnyy, and Anthony Wood

Analogy between Biological and Computational Mechanisms

• The spread of self-replicating program within computer systems is just like the transition of smallpox several centuries ago [1]

• Mathematical models which are popular in the research of biological epidemiology can also be used in the research of computer viruses [1]

• Kephart & White’s work – first time to explicitly develop and analyze quantitative models which capture the spreading characteristics of computer viruses

Kephart & White’s Work

• Based on the assumption that viruses are spread by program sharing

• Benefits of using mathematically models mentioned in their paper Evaluation and development of general polices and

heuristics for inhibiting the spread of viruses Apply to a particular epidemic, such as predicting the

course of a particular epidemic

Modeling Viral Epidemics on Directed Graphs

• Directed Graph [1] Representing an individual system as a node

in a graph Directed edges from a given node j to other

nodes represent the set of individuals that can be infected by j

A rate of infection is associated with each edge

A rate at which infection can be detected and “cured” is associated with each node

SIS Model on A Random Graph

• Random Graph – a directed graph constructed by making random, independent decisions about whether to include each of the N(N-1) possible directional edges [1]

• Techniques used by Kephart & White Deterministic approximation Approximate probabilistic analysis Simulation

Deterministic Approximation

• β – infection rate along each edge

• δ – cure rate for each node

• β’ = β p (N - 1) – average total rate at which a node attempts to infect its neighbors

• ρ’ = δ / β’ – if ρ’ > 1, the fraction of infected individuals decays exponentially from the initial value to 0, i.e. there is no epidemic; if ρ’ < 1, the fraction of infected individuals grows from the initial value at a rate which is initially exponential and eventually saturates at the value 1 - ρ’

Probabilistic Analysis

• Including new information such as: Size of fluctuations in the number of infected individuals Possibility that fluctuations will result in extinction of the

infection

• Conclusion: A lot of variance in the number of infected individuals from

one simulation to another In equilibrium the size of the infected population is

completely insensitive to the moment at which the exponential rise occurred

• Extinction probability and metastable distribution can be calculated

Simulations

• Results Higher extinction probability Lower average number of infected individuals

• Suspected reason No details of which nodes are infected No variation in the number of nodes that a

given node could infect

Simulations (cont.)

• Scenario A random graph in which most nodes are

isolated and a few are joined in small clusters

• Anything contributes to containment? Build isolated cells – worms can spread

unimpeded in a cell, but containment system will limit further infection between cells

Improvements of SIS Model on A Random Graph

• Kephart & White’s work Weak links – give a node a small but finite chance of

infecting any node which is not explicitly connected to it

Hierarchical model – extend a two-type model of strong and weak link to a hierarchy

• Wang’s work Effects of infection delay Effects of user vigilance

Does SIS & SIR Models Take Containment into Consideration?

• No to SIS and maybe Yes to SIR

• In SIS, it may be more appropriate to be called treatment

• In SIR, deployment of containment is limited to the individual node, which means that every cell only contains one node – more appropriate to be called treatment

• Not to be automatic

• No cooperation has been applied

Model without Containment

• Remember the assumption by Kephart & White’s work? Viruses spread by program sharing

• Modern worms spread so quickly that manual containment is impossible

• A model without containment should be built first (SI model) and then different containment methods are added to test the results

IPv6 vs. Worms

• It will be very challenging to build Internet containment systems that prevent widespread infection from worm epidemics in the IPv4 environment [2]

• It seems that the only effective defense is to increase worm scanning space Upgrading IPv4 to IPv6

How Large is IPv6 Address

• IPv6 has 2128 IP addresses [3]

• Smallest subnet has 264 addresses [3] 4.4 billion IPv4 internets

• Consider a sub-network [3] 1,000,000 vulnerable hosts 100,000 scans per second (Slammer - 4,000) 1,000 initially infected hosts It would take 40 years to infect 50% of vulnerable

population with random scanning

Worm Classification

• Spreading Media [3] Scan-based & self-propagation Email Windows File Sharing Hybrid

• Target Acquisition [3] Random Scanning Subnet Scanning Routing Worm Pre-generated Hit List Topological Stealth / Passive

Can IPv6 Really Defeats Worms?

• Traditional scan-based worms seems to be ineffective, but there may be some way to improve the scan methods [4]

• More sophisticated hybrid worms may appear, which use a variety of ways to collect addresses for a quick propagation

• Polymorphic worms may significantly increase the time to extract the signature of a worm

Improvement in Scan Methods

• Subnet Scanning The first goal may be a /64 enterprise network instead

of the whole internet

• Routing Worm Some IP addresses are not allocated

• Pre-generated Hit List Scanning Speedup the propagation and the whole address

pace can be equally divided for each zombie

Improvement in Scan Methods (cont.)

• Permutation Scanning Avoid waste of scanning one host many times

• Topological Scanning Use the information stored on compromised hosts to

find new targets

• Stealth / Passive Worm Waiting for the vulnerable host to contact you may be

more efficient to scan such a large address space

What Can IPv6 Itself Contribute?

• Public services need to be reachable by DNS At least we have some known addresses in advance [4]

• DNS name for every host because of the long IPv6 address DNS Server under attack can yield large caches of hosts [4]

• Standard method of deriving the EUI field The lower 64 bits of IPv6 is derived from the 48-bit MAC address [5]

• IPv6 neighbor-discovery cache data One compromised host can reveal the address of other hosts [4]

• Easy-to-remember host address in the transition from IPv4 to IPv6 To scan the IPv6 address space is no difference to scan the IPv4

address space [4]

More Sophisticated Hybrid Worms

• Humans are involved

• Different methods are used to compromise a host, so the vulnerability density increases relatively

• Nimda’s success…

Polymorphic Worms

• Very effective containment system extracts a worm’s feature to do content filtering

• Even though some methods exist to detect polymorphic worms, the successful rate may not be 100%

What Should We Do Then?

• To find out whether the following methods which just speedup the worm’s propagation in IPv4 can just make worm’s quick propagation in IPv6 possible Improvement in scan methods + IPv6 inherent

features More sophisticated hybrid worms Polymorphic worms

Future’s Work

• Use traditional models to see whether each method or a combination of them can make a quick propagation of worm in IPv6 possible

• Add new features of worm spread in IPv6 to build new models, which can represent the reality more precisely

• If quick propagation can be true in IPv6, relative containment methods should be figured out – it should be much more possible than in IPv4

References

[1] Jeffrey O. Kephart, Steve R. White. Directed-Graph Epidemiological Models of Computer Viruses

[2] David Moore et al. Internet Quarantine: Requirements for Containing Self-Propagating Code

[3] Mark Shaneck. Worms: Taxonomy and Detection.

[4] Sean Convery et al. IPv6 and IPv4 Threat Comparison and Best Practice Evaluation.

[5] Michael H. Warfield et al. Security Implications of IPv6.

Propagation and Containment of Worms

General modeling of worm propagation and containment

strategies

Ways to mitigate the threat of worms

• Prevention– Prevent the worm from spreading by reducing the size

of vulnerable hosts

• Treatment– Neutralize the worm by removing the vulnerability it is

trying to exploit

• Containment– Prevent the worm from spreading from infected

systems to the unaffected, but vulnerable hosts

Containment Approaches

• La Brea– Intercepts probes to unallocated addresses

• Connection-history based anomaly detection– Analyzes connection traffic trying to detect an anomaly

• Per host throttling– Restricts connection rate to “new” hosts

• Blocking access to affected ports– Prevents affected hosts from accessing vulnerable ports on

other machines

• NBAR– Filters packets based on the content using worm signatures– It was very effective in preventing the spread of Code-Red

General model for worm infection rate

Modeling Containment Systems

• Reaction Time– Time required to detect the infection, spread the

information to all the hosts participating in the system, and to activate containment mechanisms

• Containment Strategy– Strategy that isolates the worm from uninfected

susceptible systems (e.g. address blacklisting and content filtering)

• Deployment Scenario– “Who”, “Where” and “How” of the containment

strategy implementation

Simulation parameters

• Population = 2^32 (assuming IPv4)• Number of vulnerable hosts = 360,000 (same as for

Code-Red v2)• Any probe to the susceptible host results in an infection• A probe to the infected or non-vulnerable host has no

effect• The first host is infected at time 0• If a host is infected in time t, then all susceptible hosts

are notified at time t + R where R is the reaction time of the system

• Simulation is run 100 times

Simulation Goals

• Determine the reaction time needed to limit the worm propagation for address-blacklisting and content filtering

• Compare the two containment strategies

• Realize the relationship between reaction time and worm probe rate

Idealized Deployment Simulationfor Code-Red

Idealized Deployment Simulationfor Code-Red Conclusions

• The strategy is effective if under 1% of susceptible hosts are infected within the 24 hour period with 95% certainty

• Address-blacklisting is effective if reaction time is less than 20 minutes– Note: if reaction time > 20 minutes, all susceptible

hosts will eventually become infected

• Content filtering is effective if reaction time is less than two hours– How many susceptible hosts will become infected

after time R (reaction time)?

Idealized Deployment Simulationfor General Worm (1)

• The authors generalize the definition of the effectiveness as the reaction time required to contain the worm to a given degree of global infection

• Worm aggressiveness – rate at which infected host probes others to propagate– Note: The rate of host probes does not take

into account the possibility of preferential status that some addresses may have

Idealized Deployment Simulationfor General Worm (2)

Idealized Deployment Simulationfor General Worm conclusions

• Worms that are more aggressive than Code-Red – having higher probe rate of 100 probes/second require a reaction time of under three minutes using Address-blacklisting and under 18 minutes for Content Filtering to contain the worm to 10% of total susceptible population.

Practical Deployment

• Analyzing practical deployment, authors concentrate on content filtering because of seemingly much lower requirements on the reaction time compared to address-blacklisting. This may be premature because the technique is still useful. It would be very beneficial to see a hybrid containment strategy that uses both content filtering and address-blacklisting.

Practical Deployment simulation parameters

• The topology of the Internet is taken at the time of the spread of Code-Red v2.

• The number of vulnerable hosts is 338,652 (some hosts map to multiple autonomous systems; they have been removed; only infected hosts in the first 24 hours are inc.)

• The number of autonomous systems is 6,378• The packet is assumed to travel along the

shortest path through autonomous systems

Practical Deployment for Code-Red• Reaction time is two hours (less than 1%

infected in idealized simulation)

Practical Deployment for Code-Red conclusions

• ISP deployment is more effective by itself than the Customer deployment

• 40 top ISPs can limit the infection to under 5% whereas top 75% of Customer Autonomous systems can only limit infection to 25%

• The results could have been anticipated based on the role of ISPs (their topology)

Practical Deployment for Generalized Worm

• We investigate reaction time requirements

Practical Deployment for Generalized Worm conclusions

• For probe rate of 100 probes/second or larger, neither deployment can effectively contain the worm

• In the best case, effective containment is possible for 30 or fewer probes by the TOP 100 ISPs and only 2 or fewer probes by the 50% Customers

• Note: TOP 100 ISPs cannot prevent a worm from infecting less than 18% of the hosts if the probe rate is 100 probes/second (not on the graph)

Conclusions of the modeling scheme (1)

• Automated means are needed to detect the worm and contain it.

• Content filtering is more effective than address-blacklisting, but a combination of several strategies may need to be employed.

• The reaction time has to be very small, on the order of minutes to be able to combat aggressive worms.

• It is important to deploy the containment filtering strategy at most top ISPs.

Conclusions of the modeling scheme (2)

• The parameters of the model have changed• Other containment strategies need to be

considered• What will happen to the population parameter?• What may happen to beta soon?

• Combination of prevention, treatment and containment strategies are needed to combat aggressive worms

LaBrea (1)

• LaBrea is a Linux-based application which works at the network application layer creating virtual machines for nonexistent IP addresses when a packet to such an address reaches the network.

• Once the connection is established, LaBrea will try to hold the connection as long as possible (by moving connections from established state to persistent state, it can hold connections almost indefinitely).

LaBrea (2)

• Any connection to LaBrea is suspect because the IP address to which the packet is sent does not exist, not even in DNS.

• It can also analyze the range of IP addresses that are requested giving it a broader view of a potential attack (all ports on virtual machines appear open).

• It requires 8bps to hold 3 threads of Code-Red. If there were 300,00 infected machines each with 100 threads, 1,000 sites would require 5.2% of the full T1 line bandwidth each to hold them.

Connection-history based anomaly detection

• The idea is to use GriDS based intrusion detection approach and make some modifications to it to allow for worm containment

• Goals:– Automatic worm propagation determination– Worm detection with a low false positive rate– Effective countermeasures

• Automatic responses in real-time• Prevent infected hosts from infecting other hosts• Prevent non-infected hosts from being infected

Connection-history based anomaly detection model

• Monitoring station collects all recent connection attempts data and tries to find anomaly in it.

• Patterns of a worm– Similarity of connection patterns

• The worm tries to exploit the same vulnerability

– Causality of connection patterns• When one event follows after another

– Obsolete connections• Compromised hosts try to access services at random IPs

Very Fast Containment of Scanning Worms

Weaver, Staniford, Paxson

Outline

• Scanning

• Suppression Algorithm

• Cooperation

• Attacks

• Conclusion

What is Scanning?

• Probes from adjacent remote addresses?

• Dist. probes that cover local addresses?

• Horizontal vs. Vertical

• Factor in connection rates?

• Temporal and spatial interdependence

• How to infer intent?

Scanning Worms

• Blaster, Code Red, CR II, Nimda, Slammer

• Does not apply to:– Hit lists (flash worms)– Meta-servers (online list)– Topology detectors– Contagion worms

Scanning Detection

• Key properties of scans:– Most scanning fails– Infected machines attempt many connections

• Containment is based on worm behavior, not signatures (content)

• Containment by address blocking (blacklisting)

• Blocking can lead to DoS if false positive rate is high

Scan Suppression

• Goal 1: protect the enterprise; forget the Internet

• Goal 2: keep worm below epidemic threshold, or slow it down so humans notice

• Divide enterprise network into cells• Each is guarded by a filter employing the

scan detection algorithm

Internet

Inside, Outside, Upside Down

• Preventing scans from Internet is too hard

• If inside node is infected, filter sees all traffic

• Cell (LAN) is “outside”, Enterprise network is “inside”

• Can also treat entire enterprise as cell, Internet as outside

Inside

Scan detectorsOutside

Outside

Scan Suppression

• Assumption: benign traffic has a higher probability of success than attack traffic

• Strategy:– Count connection establishment messages in

each direction– Block when misses – hits > threshold– Allow messages for existing connections, to

reduce impact of false positives

Constraints

• For line-speed hardware operation, must be efficient:– Memory access speed

• On duplex gigabit ethernet, can only access DRAM 4 times

– Memory size• Attempt to keep footprint under 16MB

– Algorithm complexity• Want to implement entirely in hardware

Mechanisms

• Approximate caches– Fixed memory available– Allow collisions to cause aliasing– Err on the side of false negative

• Cryptographic hashes– Prevent attackers from controlling collisions– Encrypt hash input to give tag– For associative cache, split and save only part

as tag in table

Connection Cache

• Remember if we’ve seen a packet in each direction• Aliasing turns failed attempt into success (biases to false

negative)• Age is reset on each forwarded packet• Every minute, bg process purges entries older than Dconn

Address Cache

• Track “outside” addresses

• Counter keeps difference between successes and failures

• Counts are decremented every Dmiss seconds

Algorithm Pseudo-code

Internet

• UDP Probe:A → X [fwd]

A → Y [fwd]

• Normal Traffic:A → B [fwd]

B → A [fwd, bidir]

B

InOutConnection cache

Address cache

A

A,X: OutIn -

A: 1

• Scanning again:A → … [fwd until T]

A → Z [blocked]

A → B ?

[block SYN/UDP, fwd TCP]

A,Z: OutIn -A,Y: OutIn -

A: 2

A,B: OutIn -

A: 3

A,B: OutIn InOut

A: 1

A,*: OutIn -

A: TA: Cmax

Performance

• For 6000-host enterprise trace:– 1MB connection cache, 4MB 4-way address

cache = 5MB total– At most 4 memory accesses per packet– Operated at gigabit line-speed– Detects scanning at rates over 1 per minute– Low false positive rate– About 20% false negative rate– Detects scanning after 10-30 attempts

Scan Suppression – Tuning

• Parameters:– T: miss-hit difference that causes block

– Cmin: minimum allowed count

– Cmax: maximum allowed count

– Dmiss: decay rate for misses

– Dconn: decay rate for idle connections

– Cache size and associativity

Cooperation

• Divide enterprise into small cells• Connect all cells via low-latency channel• A cell’s detector notifies others when it

blocks an address (“kill message”)• Blocking threshold dynamically adapts to

number of blocks in enterprise:– T’ = T – θX, for very small θ– Changing θ does not change epidemic

threshold, but reduces infection density

Cooperation – Effect of θ

Cooperation Issues

• Poor choice of θ could cause collapse

• Lower thresholds increase false positives

• Should a complete shutdown be possible?

• How to connect cells (practically)?

Attacking Containment

• False positives– Unidirectional control flows– Spoofing outside addresses (though this does not

prevent inside systems from initiating connections)

• False negatives– Use a non-scanning technique– Scan under detection threshold– Use a whitelisted port to test for liveness

before scanning

Attacking Containment

• Detecting containment– Try to contact already infected hosts– Go stealthy if containment is detected

• Circumventing containment– Embed scan in storm of spoofed packets– Two-sided evasion:

• Inside and outside host initiate normal connections to counter penalty of scanning

• Can modify algorithm to prevent, but lose vertical scan detection

Attacking Cooperation

• Attempt to outrace containment if threshold is permissive

• Flood cooperation channels

• Cooperative collapse:– False positives cause lowered thresholds– Lowered thresholds cause more false

positives– Feedback causes collapse of network

Conclusion

Additional References

• Weaver, Paxson, Staniford, Cunningham, A Taxonomy of Computer Worms, ACM Workshop on Rapid Malcode, 2003.

• Williamson, Throttling Viruses: Restricting Propagation to Defeat Mobile Malicious Code, ACSAC, 2002.

• Jung, Paxson, Berger, Balakrishnan, Fast Portscan Detection Using Sequential Hypothesis Testing, IEEE Symposium on Security and Privacy, 2004.

Documents

Propagation and Containment Presented by Jing Yang, Leonid Bolotnyy, and Anthony Wood