12
Clone Detection by Exploiting Assembler Ian Davis, Mike Godfrey University of Waterloo Ontario, Canada

Clone Detection by Exploiting Assembler

  • Upload
    zada

  • View
    62

  • Download
    0

Embed Size (px)

DESCRIPTION

Clone Detection by Exploiting Assembler. Ian Davis, Mike Godfrey University of Waterloo Ontario, Canada. The Original Assembler. .LC107: .string "merge “ … pushl $ .LC107 pushl command_buf+8 .LCFI378: call prefixcmp addl $16,%esp testl %eax,%eax jne .L485 subl $8,%esp - PowerPoint PPT Presentation

Citation preview

Page 1: Clone Detection by Exploiting Assembler

Clone Detection by Exploiting Assembler

Ian Davis, Mike Godfrey

University of Waterloo

Ontario, Canada

Page 2: Clone Detection by Exploiting Assembler

IWSC May 2010 Clone Detection by Exploiting Assembler

2

Page 3: Clone Detection by Exploiting Assembler

IWSC May 2010 Clone Detection by Exploiting Assembler

3

Page 4: Clone Detection by Exploiting Assembler

IWSC May 2010 Clone Detection by Exploiting Assembler

4

Page 5: Clone Detection by Exploiting Assembler

IWSC May 2010 Clone Detection by Exploiting Assembler

5

Page 6: Clone Detection by Exploiting Assembler

IWSC May 2010 Clone Detection by Exploiting Assembler

6

.LC107: .string "merge “…pushl $.LC107pushl command_buf+8.LCFI378:call prefixcmpaddl $16,%esptestl %eax,%eaxjne .L485subl $8,%esppushl $32pushl command_buf+8call strchraddl $16,%esp incl %eaxmovl %eax,-16 (%ebp)subl $12,%esppushl $24call xmallocaddl $16,%espmovl %eax,-8(%ebp)subl $12,%esppushl -16 (%ebp)call lookup_branch….L485

The Original Assembler

• Identify function boundaries

• Relate assembler back to source

• Remove comments, white space, etc.

• Normalize instruction set if needed

• Convert to relative addressing

• Inline string constants

• Reconstruct parameter names

• Reconstruct local variable names

Page 7: Clone Detection by Exploiting Assembler

IWSC May 2010 Clone Detection by Exploiting Assembler

7

pushl $"merge " pushl command_buf+8

call prefixcmpaddl $16,%esptestl %eax,%eaxjne +124subl $8,%esppushl $32pushl command_buf+8call strchraddl $16,%esp incl %eaxmovl %eax,from(%ebp)subl $12,%esppushl $24call xmallocaddl $16,%espmovl %eax,n (%ebp)subl $12,%esppushl from(%ebp)call lookup_branch

The Annotated Assembler

• Identify function boundaries

• Relate assembler to source

• Remove comments, white space, etc.

• Normalize instruction set if needed

• Convert to relative addressing

• Inline string constants

• Reconstruct parameter names

• Reconstruct local variable names

Page 8: Clone Detection by Exploiting Assembler

IWSC May 2010 Clone Detection by Exploiting Assembler

8

The Matching Algorithm

• Scan entire source once

• Use hashing to find first pairing

• Ignore pairings in identified clones

• Don’t cross function boundaries

• Terminate clone before later in function

• Weight matches (+) and mismatches (-)

• Special logic for matching branches

• Advance greedily while weight ≥ 0

• Then employ hill climbing

• Continue while improvement possible

• Accept if clones satisfy minimum length

• Alternative minimum for matching functions

Page 9: Clone Detection by Exploiting Assembler

IWSC May 2010 Clone Detection by Exploiting Assembler

9

from = strchr(command_buf.buf, ' ') + 1;n = xmalloc(sizeof(*n));s = lookup_branch(from);if (s) hashcpy(n->sha1, s->sha1);else if (*from == ':') {

uintmax_t idnum = strtoumax(from + 1, NULL, 10); struct object_entry *oe = find_mark(idnum ); if (oe->type != OBJ_COMMIT) die("Mark :%" PRIuMAX " not a commit", idnum ); hashcpy(n->sha1, oe->sha1);} else if (!get_sha1(from, n->sha1)) { unsigned long size;

char *buf = read_object_with_reference(n->sha1, commit_type, &size, n->sha1); if (!buf || size < 46) die("Not a valid commit: %s", from); free(buf);} else die("Invalid ref name or SHA1 expression: %s", from);

Source Clone 1

Page 10: Clone Detection by Exploiting Assembler

IWSC May 2010 Clone Detection by Exploiting Assembler

10

from = strchr(command_buf.buf, ' ') + 1;

s = lookup_branch(from);if (s) hashcpy( sha1, s->sha1);else if (*from == ':') { struct object_entry *oe; from_mark = strtoumax(from + 1, NULL, 10); oe = find_mark(from_mark); if (oe->type != OBJ_COMMIT) die("Mark :%" PRIuMAX " not a commit", from_mark); hashcpy( sha1, oe->sha1);} else if (!get_sha1(from, sha1)) { unsigned long size; char *buf; buf = read_object_with_reference( sha1, commit_type, &size, sha1); if (!buf || size < 46) die("Not a valid commit: %s", from); free(buf);} else die("Invalid ref name or SHA1 expression: %s", from);

Source Clone 2

Page 11: Clone Detection by Exploiting Assembler

IWSC May 2010 Clone Detection by Exploiting Assembler

11

Benefits and Conclusions

Assembler easy to derive from source / object / executable

Compliments other clone detection approaches

Compiler performs useful normalization of source for free

The analysis is semantic – not syntactic By function (forbidding overlapped clones pairs) Can handle branching sensibly Case statements easier to handle Can weight different assembler instructions differently Can reason about assembler when performing detection

Page 12: Clone Detection by Exploiting Assembler

IWSC May 2010 Clone Detection by Exploiting Assembler

12

Thank You