Lecture 10. Dynamic Analysis - Iowa State...

Preview:

Citation preview

Lecture 10. Dynamic Analysis

Wei LeThank Xiangyu Zhang, Michael Ernst, Tom Ball

for Some of the Slides

Iowa State University

2014.11

Outline

I What is dynamic analysis?

I Instrumentation

I AnalysisI Representing dynamic informationI Dynamic slicing

I Applications: specification generation and Diakon

What is dynamic analysis

Learn from executions:

I Run programsI where are the test inputs?I what are the coverage?I what parts of the code should be run?I how to create runtime environment?

I Collect dataI online or offline (log analysis)?I what is the overhead (memory and performance)I how to capture and replay the data: compress, statistical approach

I Analyze dataI dynamic slicingI model-basedI statistical and machine learning approaches (an interesting discussion

on comparing the two: https://www.quora.com/What-is-the-difference-between-statistics-and-machine-learning)

What is the meaning of Run?

I Abstract interpretation and static analyses run a program over anabstract domain:OUT=F(IN,s)

I Dynamic analysis: abstraction used in parallel with, not in place of,concrete values:OUT=F(IN, si, v)

How dynamic analysis is used?

I Profiling-guided code optimization: which paths are frequentlyexecuted?

I Performance analysis

I Mining program invariants, protocols

I Fault localization, debugging (replay concurrency bugs)

I Bug finding

I ...

Running the programs

I The test suite determines the expense (in time and space)

I The test suite determines the accuracy (what executions are neverseen)

I Coverage or frequency: statements, branches, paths, procedure calls,types, method dispatch

I Values computed: parameters, array indices

I Run time, memory usage

InstrumentationA technique that inserts extra code into a program to collect runtimeinformation.

InstrumentationA technique that inserts extra code into a program to collect runtimeinformation.

InstrumentationA technique that inserts extra code into a program to collect runtimeinformation.

Source and Binary Instrumentation

I Source instrumentation: instrument source programs

I Binary instrumentation: instrument executables directly:I No need to recompile or relinkI Discover code at runtimeI Handle dynamically-generated codeI Attach to running processes

Binary Instrumentation Tool Is Dominant

I Libraries are a big pain for source code level instrumentation e.g.,Proprietary libraries: communication (MPI, PVM), linear algebra(NGA), database query (SQL libraries).

I Easily handle multi-lingual programs: source code levelinstrumentation is heavily language dependent.

I More complicated semantics: Turning off compiler optimizations canmaintain a almost perfect mapping from instructions to source codelines

I Worms and viruses are rarely provided with source code

Static Instrumentation

Dynamic Instrumentation

Instrumentation Tools

I Source instrumentation tools: AIMS, Paradyn, Pablo ... [1]

I Binary instrumentation tools:I Static (insert in the code and then run): ATOM, EEL, Etch, MorphI Dynamic (run while insert the instrumentation): Dyninst, Vulcan,

DTrace, Valgrind, Strata, DynamoRIO, PIN

Valgrind

See Valgrind slides

PIN

See PIN slides

Instrumentation: a Summary

I Source-level:I Pattern-matching over parse tree or AST and rewritingI Full access to source informationI A* [Ladd, Ramming], Astlog [Crew],

I Binary-level:I ATOM [Srivastava] , EEL [Larus], Diablo, BlutoI Analyze programs from multiple languagesI Typicaly done at some IR level

I Dynamic instrumentation (run-time): Valgrind, PIN

I Java instrumentation tools:I ASM (http://asm.objectweb.org/), Small and fast, Supports Java

1.5I BCEL (http://jakarta.apache.org/bcel/), Works well, No longer

under active developmentI SERP (http://serp.sourceforge.net/), Not designed for speed,

Somewhat memory intensiveI Soot (http://www.sable.mcgill.ca/soot/), Supports source, Designed

for optimizations

Design Your Instrumentation

I How much to generate?I EverythingI Just the necessary factsI Less than necessary

I On-line vs. off-line analysisI What/When to instrument?I Source code, IR, assembly, machine codeI Preprocessor, compile-time, link-time, executable, run-time

Control Flow Trace

Dynamic Dependence Graph

Whole Execution Trace

Dynamic Slicing

See Xiangyu’s slides

Specification Generation-Diakon

I Daikon infers invariants from a program trace

I Looks for invariants between each combination of variables

I Polynomial in the number of variables

I Easy to implement offline, first pass finds equal variables

I More complex to implement incrementally.

Specification Generation-Diakon

Specification Generation

Specification Generation

Data Analysis Techniques

Data mining (mining database) fields, related to information retrieval,text mining

I Associate rules mining

I Sequential patterns mining

I Beyond:Structure patterns (tree, graphs) mining

I Definition: what is it?

I Which algorithm? there exist off-the-shelf packages

Associate Rules Mining

I Proposed by Agrawal et al in 1993.

I Initially used for Market Basket Analysis to find how items purchasedby customers are related: Bread → Milk [sup = 5%, conf = 100%]

I Applications: Object, propertiesI patient / symptomsI movies / ratingsI web pages / keywordsI basket / productsI what are examples in computing?

The Model: Data

The Model: Rules

Algorithms and Extensions

I Apriori

I Eclat

I FP-growth

Relevant concepts:

I Frequent itemset mining

I Maximal itemset mining [Bayardo, 1998]

I Closed itemset mining [Pasquier, 1999]

Patterns in Sequences

I Substrings

I Regular expressions

I Partial orders

I Structured patterns, directed acyclic graphs (mining for frequentsubgraphs)

I Episodes

I Constraint-based sequential pattern mining (e.g., the item satisfiescertain values)

Sequential Pattern Mining [2]

Sequential pattern mining: given a set of sequences, find the completeset of frequent subsequences

Sequential Mining Algorithms

Software

C++ Implementations of Apriori, Eclat, FP-growth and several otheralgorithms are available onhttp://www.adrem.ua.ac.be/ goethals/software/ and onhttp://fimi.cs.helsinki.fi/

Designing Your Dynamic Analysis

Dynamic analysis is, at its heart, an experimental effort:

I Have insight

I Build tool

I Evaluate efficiency and effectiveness

I Rethink

Jonathan Chen.extensible source-level instrumentation tool for performancemeasurement.Technical report, Rensselaer Polytechnic Institute, 2007.

Carl H. Mooney and John F. Roddick.Sequential pattern mining – approaches and algorithms.ACM Comput. Surv., 45(2):19:1–19:39, March 2013.

Recommended