32
Códigos y Criptografía Francisco Rodríguez Henríquez Software Security Through Code Obfuscation

Códigos y Criptografía Francisco Rodríguez Henríquez Software Security Through Code Obfuscation

Embed Size (px)

Citation preview

Códigos y Criptografía Francisco Rodríguez Henríquez

Software Security ThroughCode Obfuscation

Códigos y Criptografía Francisco Rodríguez Henríquez

Outline• Introduction

– Definition

– Problem Statement

• Code Obfuscation Process

• Transformations– Metrics for Obfuscation Transformations

– Classification of Transforms

• De-Obfuscation– Commonly Employed Techniques

• The Power of Obfuscation

Códigos y Criptografía Francisco Rodríguez Henríquez

Why Code Obfuscation?

Intellectual Protection

Legal protection

Obfuscation Encryption

Technical Protection

Server-sideExecution

Trusted native code

Códigos y Criptografía Francisco Rodríguez Henríquez

JustificationIf Bob is able to retrieve Alice’s original source, he can intercept proprietary information such as data structures,

algorithms, etc.

Source

Object Code

obfuscate

compile

ObfuscatedObject code

Server

Client

ObfuscatedObject code

De-obfuscate

De-compile

Executer

Object Code

Source

Alice Bob

Códigos y Criptografía Francisco Rodríguez Henríquez

Code Obfuscation Process• Determining Potency vs. Cost:

– Potency:• The level of obfuscation applied to the code.

– Cost:• Maximum execution time/space that the obfuscated code adds to

the application.

– In order to determine which level of obfuscation we desire, we must first analyze how much we are willing to forgo in program efficiency; hence the relation: Potency vs. Cost.

Códigos y Criptografía Francisco Rodríguez Henríquez

• Source Pre-Processing– Much like a compiler, this step gathers information about the

application in order to determine which transformations will lead to the desired level of obfuscation.

– Types of Information Gathered:• Symbol Table

• Data-Flow

• Data-Dependence

• Language Constructs

• Programming Idioms

Códigos y Criptografía Francisco Rodríguez Henríquez

– Source goes through a number of pre-defined Obfuscating Transformations until the desired relation of potency vs. cost is reached.

– Definition of an Obfuscating Transformation:

• Let P P’ be a transformation of a source program P into a target program P’. P P’ is an obfuscating transformation if P and P’ have the same observable behavior. More precisely, in order for P P’ to be a legal transformation the following must hold:

– If P fails to terminate or terminates with an error, then P’ may or may not terminate.

– Otherwise, P’ must terminate and produce the same output as P.

– We classify an obfuscation transformation according to the type of information it targets and its level of potency.

Transformations.

Códigos y Criptografía Francisco Rodríguez Henríquez

• Measure of Potency• Measure of Resilience

• Measure of Execution Cost

Formal Definition of the Quality of an

obfuscating transform:

– Tqual(P) = [Tpot(P), Tres(P), Tcost(P)]

Evaluation of Obfuscating Transforms (3 Metrics)

Códigos y Criptografía Francisco Rodríguez Henríquez

• Let T be a behavior-conserving transformation s.t. P T P’ transforms a source program P into a target program P’. Let E(P) be the complexity of P, as defined by known software complexity metrics.

– Tpot(P) is defined as E(P’)/E(P) – 1.

– T is a potent obfuscating transformation if Tpot > 0. From here, we will define the potency of a transform as <low, medium, high>.

– In order for a transform to be sufficiently potent, it should:

• Increase overall program size and introduce new classes/methods

• Introduce new predicates and increase the nesting level of conditional/looping constructs

• Increase the number of method arguments and inter-class instance variable dependencies

• Increase the height of the inheritance tree

• Increase long-range variable dependencies

Measure of Potency

Códigos y Criptografía Francisco Rodríguez Henríquez

• Resilience (according to the Merriam-Webster):

– 1 : the capability of a strained body to recover its size and shape after deformation caused especially by compressive stress2 : an ability to recover from or adjust easily to misfortune or change

– A transform is potent if it manages to confuse a human reader, but it is resilient if it confuses an automatic de-obfuscator.

– We base resiliency primarily on the scope of effect due to a transform. That is, if a transform effects an entire program it is more likely to provide is with a more resilient program.

– Resiliency is measured from trivial to one-way, with one-way defining a transformation that gives code P’ from which it is impossible to recover P.

Measure of Resilience

Códigos y Criptografía Francisco Rodríguez Henríquez

– The third component in describing the quality of a transformation is that

of cost, which is based on the execution time/space penalty which is

incurred upon an obfuscated application after transformation.

– Cost is measured on a four-point scale:

• Dear: if executing P’ requires exponentially more resources than P

• Costly: if executing P’ requires O(n^p), p > 1 more resources than P

• Cheap: if executing P’ requires O(n) more resources than P

• Free: if executing P’ requires O(1) more resources than P

Measure of Execution Cost

Códigos y Criptografía Francisco Rodríguez Henríquez

• Trivial but irreversible transformations• Examples:

– Formatting Removal:• Tqual(P) = [low, one-way, free]• Removes source code formatting such as tabulation and carriage

returns. This is a free yet un-reversible transformation.• Code:

voltage = current * resistance; power = (voltage * voltage) * resistance;

voltage=current*resistance;power=(voltage*voltage)*resistance;

Classification of Transformations: Layout Transformations

Códigos y Criptografía Francisco Rodríguez Henríquez

– Scrambling Identifier Names:• Tqual(P) = [medium, one-way, free]• Removes pragmatic information inherent in identifier names thus

providing a higher level of potency; however, once transformed it cannot be undone.

• Code: voltage=current*resistance;power=(voltage*voltage)*resistance; v4=i12*r15; p6=(v4*v4)*r15;

Classification of Transformations: Layout Transformations

Códigos y Criptografía Francisco Rodríguez Henríquez

• Purpose is to obscure the control flow of the source application

– Control Aggregation Transformations break up computations

that logically belong together or merge computations that do

not.

– Control Ordering Transformations randomize the order in which

computations are carried out.

– Control Computation Transformations insert new redundant or

dead code, or make algorithmic changes.

• Transformations which alter the flow of control have the largest

computational overhead.

Classification of Transformations: Control Transformations

Códigos y Criptografía Francisco Rodríguez Henríquez

• The real challenge in designing control-altering transformations is to make them cheap and resistant to attack from de-obfuscation. To accomplish this, many transformations are based upon opaque variables and opaque predicates.

• A variable V is opaque if it has some property q which is known a priori to the obfuscator, but is difficult for a de-obfuscator to deduce. Likewise, a predicate P (boolean expression) is opaque if a de-obfuscator can only deduce its outcome with great difficulty, while this outcome is known to the obfuscator.

• Creation of Opaque Variables and Predicates which are difficult for a de-obfuscator to crack yet use little resources is a major area of research within Code Obfuscation, and is the key to highly resilient control transformations.

Opaque Predicates

Códigos y Criptografía Francisco Rodríguez Henríquez

• Examples of applied Control Aggregation Transformations:

– Cloned Methods Example:

A Reverse Engineer, when trying to understand the purpose of a subroutine, will often examine its signature and body as well as the different environments in which it is called. To obfuscate this, we apply a transform which obscures a method’s call sites. In doing this, we make it appear that different routines are being called.

We create several different versions of a method by applying various transformations to the original code. At runtime we use different predicates to select which version to run.

Aggregation Transformations

Códigos y Criptografía Francisco Rodríguez Henríquez

Aggregation Transformations

Códigos y Criptografía Francisco Rodríguez Henríquez

• In object-oriented languages such as Java, control is organized around data structures rather than the reverse. Therefore, the most important part of reverse engineering such languages is to recover their data structures. Aggregation Transforms are used to aggregate data in arrays and objects.

• Example: Restructuring Arrays• Next we see a number of transformations performed to obscure an array.

First, we attempt to split an array into several sub-arrays [statements (1-2)]. We then merge two arrays into one array [statements (3-5)]. Folding an array increases its number of dimensions [statements (6-7)]. Finally, we show the concept of flattening an array thus reducing its number of dimensions [statements (8-9).

• Performing splitting and folding greatly increases the complexity of our array structures, while merging and flattening decreases the complexity. The purpose of this is to introduce structure to a program where little existed before, and remove structure where it once existed. Therefore, the obscurity of the program is greatly increased.

Aggregation Transformations

Códigos y Criptografía Francisco Rodríguez Henríquez

Aggregation Transformations

Códigos y Criptografía Francisco Rodríguez Henríquez

(a) Next, we see a Loop Blocking transformation applied to the given loop. Loop Blocking is the process in which we aim to improve the cache behavior of a loop by breaking up the iteration space such that the inner loop fits into the cache space.(b) Here we apply the concept of Loop Unrolling, during which we replicate the body of the loop one or more times. If we know the loop bounds at compile time, we can unroll the loop in its entirety.(c) Loop Fission is applied in this example. Here we aim to turn a loop with a compound body into several loops of the same iteration space.

All three types of Loop Transformations increase the source applications total size and number of conditions, while the first transformation also introduces extra nesting. When we use these methods in isolated circumstances, they provide us with little resilience. However, when applied in serial, the resilience of the total transformation increases dramatically thus requiring significant analysis by a de-obfuscator.

Loop Transformations

Códigos y Criptografía Francisco Rodríguez Henríquez

Loop Transformations

Códigos y Criptografía Francisco Rodríguez Henríquez

• Example of an applied Control Computation Transformation (Inserting Dead or Irrelevant Code):– A) We insert an opaque predicate Pt into S (= S1…Sn), essentially

splitting it up. This predicate is irrelevant because it will always evaluate to True. One possible predicate to use would be an if-statement such as:

if (1 < 5)<evaluate left>;

else<evaluate right>;

– B) We again break S into two halves, which creates two different obfuscated versions Sa and Sb.These are created by applying various computational transforms to the second half of S. Therefore, it becomes not directly obvious to a reverse engineer that Sa and Sb perform the same function. We use a predicate P? to select between the two at runtime.

Computation Transformations

Códigos y Criptografía Francisco Rodríguez Henríquez

C) Finally, we perform a function similar to (B), but we introduce a bug into Sb and make sure that the predicate Pt always evaluates to Sa. Thus, de-obfuscation of Sb would lead to incorrect and non-functioning source code.

Computation Transformations

Códigos y Criptografía Francisco Rodríguez Henríquez

• Aim to obscure the data structures used in the source application.

• Most important for keeping proprietary structures hidden to a Reverse Engineer.

• Storage Transformations:

– Attempt to choose an unnatural storage class for dynamic as well as static data, thus making it difficult for a de-obfuscator to determine the type of data stored.

• Encoding Transformations:

– Attempt to choose unnatural encoding for common data types.

.

Data Transformations

Códigos y Criptografía Francisco Rodríguez Henríquez

Loop Transformations

• Example: Change EncodingHere we encode a simple variable i by transforming it into:

i’ = c1 * i + c2where c1 and c2 are constants. Below, we choose c1 to be a power of 2 for efficiency, and let c1 = 8, c2 = 3.

By making this transformation, we add a small amount of execution time, while obfuscating the original purpose of i.

Códigos y Criptografía Francisco Rodríguez Henríquez

Ordering Transformations

• Randomize the order in which data structures are declared in a source application. Particularly, here we aim to randomize the order of methods and instance variables within classes and formal parameters within methods.

• Example: Opaque Encoding Function

Códigos y Criptografía Francisco Rodríguez Henríquez

De-obfuscation Techniques

• Identifying Opaque Constructs

– This is the most difficult part of de-obfuscation, the identifying and evaluating of opaque constructs. These fall under three main categories:

• Local:

• Global:

• Inter-procedural:

Códigos y Criptografía Francisco Rodríguez Henríquez

De-obfuscation Techniques

• Identification by Pattern Matching

– Uses knowledge of strategies employed by obfuscators to

identify opaque predicates. This can be gathered through

de-compilation and analysis of popular obfuscation

problems. To prevent this attach avoid using canned

opaque constructs. Also, choose constructs that are

syntactically similar to those used in the real application.

• .

Códigos y Criptografía Francisco Rodríguez Henríquez

De-obfuscation Techniques

• Identification by Program Slicing

– Used by a Reverse Engineer to counter the problem that

logically related pieces of code have been broken up and

dispersed over the program. Also used to filter “live” code

from “dead” code.

– Countering this technique of de-obfuscation requires that

one adds parameter aliases and variable dependencies to

increase the slice size, thus making de-obfuscation a more

computationally-intensive process.

Códigos y Criptografía Francisco Rodríguez Henríquez

Statistical Analysis

• Used to analyze the outcome of all predicates in an obfuscated system. An alert is made about any predicate value pointing to true over multiple test runs, as it may turn out to be an opaque predicate. A powerful method of preventing this attack is to design opaque predicates in such a way that several predicates would have to be cracked at the same time in order to retrieve information.

• Example: Protecting Against Statistical Analysis

Códigos y Criptografía Francisco Rodríguez Henríquez

Statistical Analysis

• Example: Protecting Against Statistical Analysis

• Here we aim to thwart statistical analysis by forcing our opaque predicates to have side effects. Below, an obfuscator has determined that S1 and S2 must always execute the same number of times. The statements are then obfuscated using opaque predicates that call to functions Q1 and Q2, which both increment and decrement a global variable k. Now, if a de-obfuscator tries to replace one of the predicates with True, k will overflow. Thus, the de-obfuscated program will always terminate with an error.

Códigos y Criptografía Francisco Rodríguez Henríquez

The Power of Obfuscation

• In reality, an obfuscated program really consists of two programs merged into one: a real program which performs a useful task and a bogus task which computes useless information. The sole purpose of this bogus task is to confuse Reverse Engineers by hiding the real program behind irrelevant code.

• Encryption vs. Obfuscation:– Both are attempts at hiding data from “prying” eyes.– Both have a shelf life lasting until it is possible to “crack” the given protection.

• Future Areas of Research:– New obfuscating transformations– Interaction and ordering between different transformations (optimization)– Relationship between potency and cost (which has the most “bang-for-the-buck”)

• Other Uses of Obfuscation:– Tracing of Software Piracy

• Different obfuscated versions of the same code would be sold to all customers, thus making it easily identifiable which one distributed their application to others.

– Mobile Agent Security• Enforcing “Blackbox” security techniques on un-trusted hosts.