Upload
bernard-grant
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Códigos y Criptografía Francisco Rodríguez Henríquez
Outline• Introduction
– Definition
– Problem Statement
• Code Obfuscation Process
• Transformations– Metrics for Obfuscation Transformations
– Classification of Transforms
• De-Obfuscation– Commonly Employed Techniques
• The Power of Obfuscation
Códigos y Criptografía Francisco Rodríguez Henríquez
Why Code Obfuscation?
Intellectual Protection
Legal protection
Obfuscation Encryption
Technical Protection
Server-sideExecution
Trusted native code
Códigos y Criptografía Francisco Rodríguez Henríquez
JustificationIf Bob is able to retrieve Alice’s original source, he can intercept proprietary information such as data structures,
algorithms, etc.
Source
Object Code
obfuscate
compile
ObfuscatedObject code
Server
Client
ObfuscatedObject code
De-obfuscate
De-compile
Executer
Object Code
Source
Alice Bob
Códigos y Criptografía Francisco Rodríguez Henríquez
Code Obfuscation Process• Determining Potency vs. Cost:
– Potency:• The level of obfuscation applied to the code.
– Cost:• Maximum execution time/space that the obfuscated code adds to
the application.
– In order to determine which level of obfuscation we desire, we must first analyze how much we are willing to forgo in program efficiency; hence the relation: Potency vs. Cost.
Códigos y Criptografía Francisco Rodríguez Henríquez
• Source Pre-Processing– Much like a compiler, this step gathers information about the
application in order to determine which transformations will lead to the desired level of obfuscation.
– Types of Information Gathered:• Symbol Table
• Data-Flow
• Data-Dependence
• Language Constructs
• Programming Idioms
Códigos y Criptografía Francisco Rodríguez Henríquez
– Source goes through a number of pre-defined Obfuscating Transformations until the desired relation of potency vs. cost is reached.
– Definition of an Obfuscating Transformation:
• Let P P’ be a transformation of a source program P into a target program P’. P P’ is an obfuscating transformation if P and P’ have the same observable behavior. More precisely, in order for P P’ to be a legal transformation the following must hold:
– If P fails to terminate or terminates with an error, then P’ may or may not terminate.
– Otherwise, P’ must terminate and produce the same output as P.
– We classify an obfuscation transformation according to the type of information it targets and its level of potency.
Transformations.
Códigos y Criptografía Francisco Rodríguez Henríquez
• Measure of Potency• Measure of Resilience
• Measure of Execution Cost
Formal Definition of the Quality of an
obfuscating transform:
– Tqual(P) = [Tpot(P), Tres(P), Tcost(P)]
Evaluation of Obfuscating Transforms (3 Metrics)
Códigos y Criptografía Francisco Rodríguez Henríquez
• Let T be a behavior-conserving transformation s.t. P T P’ transforms a source program P into a target program P’. Let E(P) be the complexity of P, as defined by known software complexity metrics.
– Tpot(P) is defined as E(P’)/E(P) – 1.
– T is a potent obfuscating transformation if Tpot > 0. From here, we will define the potency of a transform as <low, medium, high>.
– In order for a transform to be sufficiently potent, it should:
• Increase overall program size and introduce new classes/methods
• Introduce new predicates and increase the nesting level of conditional/looping constructs
• Increase the number of method arguments and inter-class instance variable dependencies
• Increase the height of the inheritance tree
• Increase long-range variable dependencies
Measure of Potency
Códigos y Criptografía Francisco Rodríguez Henríquez
• Resilience (according to the Merriam-Webster):
– 1 : the capability of a strained body to recover its size and shape after deformation caused especially by compressive stress2 : an ability to recover from or adjust easily to misfortune or change
– A transform is potent if it manages to confuse a human reader, but it is resilient if it confuses an automatic de-obfuscator.
– We base resiliency primarily on the scope of effect due to a transform. That is, if a transform effects an entire program it is more likely to provide is with a more resilient program.
– Resiliency is measured from trivial to one-way, with one-way defining a transformation that gives code P’ from which it is impossible to recover P.
Measure of Resilience
Códigos y Criptografía Francisco Rodríguez Henríquez
– The third component in describing the quality of a transformation is that
of cost, which is based on the execution time/space penalty which is
incurred upon an obfuscated application after transformation.
– Cost is measured on a four-point scale:
• Dear: if executing P’ requires exponentially more resources than P
• Costly: if executing P’ requires O(n^p), p > 1 more resources than P
• Cheap: if executing P’ requires O(n) more resources than P
• Free: if executing P’ requires O(1) more resources than P
Measure of Execution Cost
Códigos y Criptografía Francisco Rodríguez Henríquez
• Trivial but irreversible transformations• Examples:
– Formatting Removal:• Tqual(P) = [low, one-way, free]• Removes source code formatting such as tabulation and carriage
returns. This is a free yet un-reversible transformation.• Code:
voltage = current * resistance; power = (voltage * voltage) * resistance;
voltage=current*resistance;power=(voltage*voltage)*resistance;
Classification of Transformations: Layout Transformations
Códigos y Criptografía Francisco Rodríguez Henríquez
– Scrambling Identifier Names:• Tqual(P) = [medium, one-way, free]• Removes pragmatic information inherent in identifier names thus
providing a higher level of potency; however, once transformed it cannot be undone.
• Code: voltage=current*resistance;power=(voltage*voltage)*resistance; v4=i12*r15; p6=(v4*v4)*r15;
Classification of Transformations: Layout Transformations
Códigos y Criptografía Francisco Rodríguez Henríquez
• Purpose is to obscure the control flow of the source application
– Control Aggregation Transformations break up computations
that logically belong together or merge computations that do
not.
– Control Ordering Transformations randomize the order in which
computations are carried out.
– Control Computation Transformations insert new redundant or
dead code, or make algorithmic changes.
• Transformations which alter the flow of control have the largest
computational overhead.
Classification of Transformations: Control Transformations
Códigos y Criptografía Francisco Rodríguez Henríquez
• The real challenge in designing control-altering transformations is to make them cheap and resistant to attack from de-obfuscation. To accomplish this, many transformations are based upon opaque variables and opaque predicates.
• A variable V is opaque if it has some property q which is known a priori to the obfuscator, but is difficult for a de-obfuscator to deduce. Likewise, a predicate P (boolean expression) is opaque if a de-obfuscator can only deduce its outcome with great difficulty, while this outcome is known to the obfuscator.
• Creation of Opaque Variables and Predicates which are difficult for a de-obfuscator to crack yet use little resources is a major area of research within Code Obfuscation, and is the key to highly resilient control transformations.
Opaque Predicates
Códigos y Criptografía Francisco Rodríguez Henríquez
• Examples of applied Control Aggregation Transformations:
– Cloned Methods Example:
A Reverse Engineer, when trying to understand the purpose of a subroutine, will often examine its signature and body as well as the different environments in which it is called. To obfuscate this, we apply a transform which obscures a method’s call sites. In doing this, we make it appear that different routines are being called.
We create several different versions of a method by applying various transformations to the original code. At runtime we use different predicates to select which version to run.
Aggregation Transformations
Códigos y Criptografía Francisco Rodríguez Henríquez
• In object-oriented languages such as Java, control is organized around data structures rather than the reverse. Therefore, the most important part of reverse engineering such languages is to recover their data structures. Aggregation Transforms are used to aggregate data in arrays and objects.
• Example: Restructuring Arrays• Next we see a number of transformations performed to obscure an array.
First, we attempt to split an array into several sub-arrays [statements (1-2)]. We then merge two arrays into one array [statements (3-5)]. Folding an array increases its number of dimensions [statements (6-7)]. Finally, we show the concept of flattening an array thus reducing its number of dimensions [statements (8-9).
• Performing splitting and folding greatly increases the complexity of our array structures, while merging and flattening decreases the complexity. The purpose of this is to introduce structure to a program where little existed before, and remove structure where it once existed. Therefore, the obscurity of the program is greatly increased.
Aggregation Transformations
Códigos y Criptografía Francisco Rodríguez Henríquez
(a) Next, we see a Loop Blocking transformation applied to the given loop. Loop Blocking is the process in which we aim to improve the cache behavior of a loop by breaking up the iteration space such that the inner loop fits into the cache space.(b) Here we apply the concept of Loop Unrolling, during which we replicate the body of the loop one or more times. If we know the loop bounds at compile time, we can unroll the loop in its entirety.(c) Loop Fission is applied in this example. Here we aim to turn a loop with a compound body into several loops of the same iteration space.
All three types of Loop Transformations increase the source applications total size and number of conditions, while the first transformation also introduces extra nesting. When we use these methods in isolated circumstances, they provide us with little resilience. However, when applied in serial, the resilience of the total transformation increases dramatically thus requiring significant analysis by a de-obfuscator.
Loop Transformations
Códigos y Criptografía Francisco Rodríguez Henríquez
• Example of an applied Control Computation Transformation (Inserting Dead or Irrelevant Code):– A) We insert an opaque predicate Pt into S (= S1…Sn), essentially
splitting it up. This predicate is irrelevant because it will always evaluate to True. One possible predicate to use would be an if-statement such as:
if (1 < 5)<evaluate left>;
else<evaluate right>;
– B) We again break S into two halves, which creates two different obfuscated versions Sa and Sb.These are created by applying various computational transforms to the second half of S. Therefore, it becomes not directly obvious to a reverse engineer that Sa and Sb perform the same function. We use a predicate P? to select between the two at runtime.
Computation Transformations
Códigos y Criptografía Francisco Rodríguez Henríquez
C) Finally, we perform a function similar to (B), but we introduce a bug into Sb and make sure that the predicate Pt always evaluates to Sa. Thus, de-obfuscation of Sb would lead to incorrect and non-functioning source code.
Computation Transformations
Códigos y Criptografía Francisco Rodríguez Henríquez
• Aim to obscure the data structures used in the source application.
• Most important for keeping proprietary structures hidden to a Reverse Engineer.
• Storage Transformations:
– Attempt to choose an unnatural storage class for dynamic as well as static data, thus making it difficult for a de-obfuscator to determine the type of data stored.
• Encoding Transformations:
– Attempt to choose unnatural encoding for common data types.
.
Data Transformations
Códigos y Criptografía Francisco Rodríguez Henríquez
Loop Transformations
• Example: Change EncodingHere we encode a simple variable i by transforming it into:
i’ = c1 * i + c2where c1 and c2 are constants. Below, we choose c1 to be a power of 2 for efficiency, and let c1 = 8, c2 = 3.
By making this transformation, we add a small amount of execution time, while obfuscating the original purpose of i.
Códigos y Criptografía Francisco Rodríguez Henríquez
Ordering Transformations
• Randomize the order in which data structures are declared in a source application. Particularly, here we aim to randomize the order of methods and instance variables within classes and formal parameters within methods.
• Example: Opaque Encoding Function
Códigos y Criptografía Francisco Rodríguez Henríquez
De-obfuscation Techniques
• Identifying Opaque Constructs
– This is the most difficult part of de-obfuscation, the identifying and evaluating of opaque constructs. These fall under three main categories:
• Local:
• Global:
• Inter-procedural:
Códigos y Criptografía Francisco Rodríguez Henríquez
De-obfuscation Techniques
• Identification by Pattern Matching
– Uses knowledge of strategies employed by obfuscators to
identify opaque predicates. This can be gathered through
de-compilation and analysis of popular obfuscation
problems. To prevent this attach avoid using canned
opaque constructs. Also, choose constructs that are
syntactically similar to those used in the real application.
• .
Códigos y Criptografía Francisco Rodríguez Henríquez
De-obfuscation Techniques
• Identification by Program Slicing
– Used by a Reverse Engineer to counter the problem that
logically related pieces of code have been broken up and
dispersed over the program. Also used to filter “live” code
from “dead” code.
– Countering this technique of de-obfuscation requires that
one adds parameter aliases and variable dependencies to
increase the slice size, thus making de-obfuscation a more
computationally-intensive process.
Códigos y Criptografía Francisco Rodríguez Henríquez
Statistical Analysis
• Used to analyze the outcome of all predicates in an obfuscated system. An alert is made about any predicate value pointing to true over multiple test runs, as it may turn out to be an opaque predicate. A powerful method of preventing this attack is to design opaque predicates in such a way that several predicates would have to be cracked at the same time in order to retrieve information.
• Example: Protecting Against Statistical Analysis
Códigos y Criptografía Francisco Rodríguez Henríquez
Statistical Analysis
• Example: Protecting Against Statistical Analysis
• Here we aim to thwart statistical analysis by forcing our opaque predicates to have side effects. Below, an obfuscator has determined that S1 and S2 must always execute the same number of times. The statements are then obfuscated using opaque predicates that call to functions Q1 and Q2, which both increment and decrement a global variable k. Now, if a de-obfuscator tries to replace one of the predicates with True, k will overflow. Thus, the de-obfuscated program will always terminate with an error.
Códigos y Criptografía Francisco Rodríguez Henríquez
The Power of Obfuscation
• In reality, an obfuscated program really consists of two programs merged into one: a real program which performs a useful task and a bogus task which computes useless information. The sole purpose of this bogus task is to confuse Reverse Engineers by hiding the real program behind irrelevant code.
• Encryption vs. Obfuscation:– Both are attempts at hiding data from “prying” eyes.– Both have a shelf life lasting until it is possible to “crack” the given protection.
• Future Areas of Research:– New obfuscating transformations– Interaction and ordering between different transformations (optimization)– Relationship between potency and cost (which has the most “bang-for-the-buck”)
• Other Uses of Obfuscation:– Tracing of Software Piracy
• Different obfuscated versions of the same code would be sold to all customers, thus making it easily identifiable which one distributed their application to others.
– Mobile Agent Security• Enforcing “Blackbox” security techniques on un-trusted hosts.