64

Fast and Precise Sanitizer Analysis with Bek

  • Upload
    rainer

  • View
    57

  • Download
    0

Embed Size (px)

DESCRIPTION

Fast and Precise Sanitizer Analysis with Bek. Pieter Hooimeijer Ben Livshits David Molnar Prateek Saxena Margus Veanes. 2011-08-10 USENIX Security. < img src =' some untrusted input '/>. < img src =' some untrusted input '/>. Question: What could possibly go wrong?. - PowerPoint PPT Presentation

Citation preview

Page 1: Fast and Precise Sanitizer Analysis with  Bek
Page 2: Fast and Precise Sanitizer Analysis with  Bek

Fast and Precise Sanitizer Analysis with BEK

Pieter Hooimeijer Ben Livshits David Molnar Prateek Saxena Margus Veanes

2011-08-10 USENIX Security

Page 3: Fast and Precise Sanitizer Analysis with  Bek

3

Page 4: Fast and Precise Sanitizer Analysis with  Bek

4

<img src='some untrusted input'/>

Page 5: Fast and Precise Sanitizer Analysis with  Bek

5

QUESTION:

What could possibly go wrong?

<img src='some untrusted input'/>

Page 6: Fast and Precise Sanitizer Analysis with  Bek

6

<img src='some untrusted input'/>

Attacker: im.png' onload='javascript:...

Page 7: Fast and Precise Sanitizer Analysis with  Bek

7

<img src='some untrusted input'/>

Attacker: im.png' onload='javascript:...

Page 8: Fast and Precise Sanitizer Analysis with  Bek

8

<img src='some untrusted input'/>

Attacker: im.png' onload='javascript:...

Result:<img src='im.png' onload='javascri

Page 9: Fast and Precise Sanitizer Analysis with  Bek

9

<img src='some untrusted input'/>

Attacker: im.png' onload='javascript:...

Result:<img src='im.png' onload='javascriFAIL

Page 10: Fast and Precise Sanitizer Analysis with  Bek

10

Page 11: Fast and Precise Sanitizer Analysis with  Bek

11

A tale of two sanitizers…

Page 12: Fast and Precise Sanitizer Analysis with  Bek

12

' &#39;single quote html entity

Page 13: Fast and Precise Sanitizer Analysis with  Bek

13

some untrusted input

Page 14: Fast and Precise Sanitizer Analysis with  Bek

14

Library AName:Around for:Availability:

HtmlEncodeYearsReadily available to C# developers

some untrusted input

Page 15: Fast and Precise Sanitizer Analysis with  Bek

15

Library AName:Around for:Availability:

Library BName:Around for:Availability:

HtmlEncodeYearsReadily available to C# developers

HtmlEncodeYearsReadily available to C# developers

some untrusted input

Page 16: Fast and Precise Sanitizer Analysis with  Bek

16

Library AName:Around for:Availability:

Library BName:Around for:Availability:

HtmlEncodeYearsReadily available to C# developers

HtmlEncodeYearsReadily available to C# developers

' &#39; ' ' ✔ ✘

Page 17: Fast and Precise Sanitizer Analysis with  Bek

17

public static string HtmlEncode(string s){ if (s == null) return null; int num = IndexOfHtmlEncodingChars(s, 0); if (num == -1) return s; StringBuilder builder=new StringBuilder(s.Length+5); int length = s.Length; int startIndex = 0;Label_002A: if (num > startIndex) { builder.Append(s, startIndex, num-startIndex); } char ch = s[num]; if (ch > '>') { builder.Append("&#"); builder.Append(((int) ch).ToString(NumberFormatInfo.InvariantInfo)); builder.Append(';'); } else { char ch2 = ch; if (ch2 != '"') { switch (ch2) { case '<': builder.Append("&lt;"); goto Label_00D5; case '=': goto Label_00D5; case '>': builder.Append("&gt;"); goto Label_00D5; case '&': builder.Append("&amp;"); goto Label_00D5; } } else { builder.Append("&quot;"); } }Label_00D5: startIndex = num + 1; if (startIndex < length) { num = IndexOfHtmlEncodingChars(s, startIndex); if (num != -1) { goto Label_002A; } builder.Append(s, startIndex, length-startIndex); } return builder.ToString();}

.NET WebUtilityMS AntiXSS private static string HtmlEncode(string input, bool useNamedEntities, MethodSpecificEncoder encoderTweak) { if (string.IsNullOrEmpty(input)) { return input; } if (characterValues == null) { InitialiseSafeList(); } if (useNamedEntities && namedEntities == null) { InitialiseNamedEntityList(); } // Setup a new character array for output. char[] inputAsArray = input.ToCharArray(); int outputLength = 0; int inputLength = inputAsArray.Length; char[] encodedInput = new char[inputLength * 10]; SyncLock.EnterReadLock(); try { for (int i = 0; i < inputLength; i++) { char currentCharacter = inputAsArray[i]; int currentCodePoint = inputAsArray[i]; char[] tweekedValue; // Check for invalid values if (currentCodePoint == 0xFFFE || currentCodePoint == 0xFFFF) { throw new InvalidUnicodeValueException(currentCodePoint); } else if (char.IsHighSurrogate(currentCharacter)) { if (i + 1 == inputLength) { throw new InvalidSurrogatePairException(currentCharacter, '\0'); } // Now peak ahead and check if the following character is a low surrogate. char nextCharacter = inputAsArray[i + 1]; char nextCodePoint = inputAsArray[i + 1]; if (!char.IsLowSurrogate(nextCharacter)) { throw new InvalidSurrogatePairException(currentCharacter, nextCharacter); } // Look-ahead was good, so skip. i++; // Calculate the combined code point long combinedCodePoint = 0x10000 + ((currentCodePoint - 0xD800) * 0x400) + (nextCodePoint - 0xDC00); char[] encodedCharacter = SafeList.HashThenValueGenerator(combinedCodePoint); encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else if (char.IsLowSurrogate(currentCharacter)) { throw new InvalidSurrogatePairException('\0', currentCharacter); } else if (encoderTweak != null && encoderTweak(currentCharacter, out tweekedValue)) { for (int j = 0; j < tweekedValue.Length; j++) { encodedInput[outputLength++] = tweekedValue[j]; } } else if (useNamedEntities && namedEntities[currentCodePoint] != null) { char[] encodedCharacter = namedEntities[currentCodePoint]; encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else if (characterValues[currentCodePoint] != null) { // character needs to be encoded char[] encodedCharacter = characterValues[currentCodePoint]; encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else { // character does not need encoding encodedInput[outputLength++] = currentCharacter; } } } finally { SyncLock.ExitReadLock(); } return new string(encodedInput, 0, outputLength); }

Page 18: Fast and Precise Sanitizer Analysis with  Bek

private static string HtmlEncode(string input, bool useNamedEntities, MethodSpecificEncoder encoderTweak) { if (string.IsNullOrEmpty(input)) { return input; } if (characterValues == null) { InitialiseSafeList(); } if (useNamedEntities && namedEntities == null) { InitialiseNamedEntityList(); } // Setup a new character array for output. char[] inputAsArray = input.ToCharArray(); int outputLength = 0; int inputLength = inputAsArray.Length; char[] encodedInput = new char[inputLength * 10]; SyncLock.EnterReadLock(); try { for (int i = 0; i < inputLength; i++) { char currentCharacter = inputAsArray[i]; int currentCodePoint = inputAsArray[i]; char[] tweekedValue; // Check for invalid values if (currentCodePoint == 0xFFFE || currentCodePoint == 0xFFFF) { throw new InvalidUnicodeValueException(currentCodePoint); } else if (char.IsHighSurrogate(currentCharacter)) { if (i + 1 == inputLength) { throw new InvalidSurrogatePairException(currentCharacter, '\0'); } // Now peak ahead and check if the following character is a low surrogate. char nextCharacter = inputAsArray[i + 1]; char nextCodePoint = inputAsArray[i + 1]; if (!char.IsLowSurrogate(nextCharacter)) { throw new InvalidSurrogatePairException(currentCharacter, nextCharacter); } // Look-ahead was good, so skip. i++; // Calculate the combined code point long combinedCodePoint = 0x10000 + ((currentCodePoint - 0xD800) * 0x400) + (nextCodePoint - 0xDC00); char[] encodedCharacter = SafeList.HashThenValueGenerator(combinedCodePoint); encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else if (char.IsLowSurrogate(currentCharacter)) { throw new InvalidSurrogatePairException('\0', currentCharacter); } else if (encoderTweak != null && encoderTweak(currentCharacter, out tweekedValue)) { for (int j = 0; j < tweekedValue.Length; j++) { encodedInput[outputLength++] = tweekedValue[j]; } } else if (useNamedEntities && namedEntities[currentCodePoint] != null) { char[] encodedCharacter = namedEntities[currentCodePoint]; encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else if (characterValues[currentCodePoint] != null) { // character needs to be encoded char[] encodedCharacter = characterValues[currentCodePoint]; encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else { // character does not need encoding encodedInput[outputLength++] = currentCharacter; } } } finally { SyncLock.ExitReadLock(); } return new string(encodedInput, 0, outputLength); }

public static string HtmlEncode(string s){ if (s == null) return null; int num = IndexOfHtmlEncodingChars(s, 0); if (num == -1) return s; StringBuilder builder=new StringBuilder(s.Length+5); int length = s.Length; int startIndex = 0;Label_002A: if (num > startIndex) { builder.Append(s, startIndex, num-startIndex); } char ch = s[num]; if (ch > '>') { builder.Append("&#"); builder.Append(((int) ch).ToString(NumberFormatInfo.InvariantInfo)); builder.Append(';'); } else { char ch2 = ch; if (ch2 != '"') { switch (ch2) { case '<': builder.Append("&lt;"); goto Label_00D5; case '=': goto Label_00D5; case '>': builder.Append("&gt;"); goto Label_00D5; case '&': builder.Append("&amp;"); goto Label_00D5; } } else { builder.Append("&quot;"); } }Label_00D5: startIndex = num + 1; if (startIndex < length) { num = IndexOfHtmlEncodingChars(s, startIndex); if (num != -1) { goto Label_002A; } builder.Append(s, startIndex, length-startIndex); } return builder.ToString();}

18

.NET WebUtilityMS AntiXSS

• Same behavior on all inputs?• If not, what is a

differentiating input?• Can it generate any known ‘bad’ outputs?

Page 19: Fast and Precise Sanitizer Analysis with  Bek

19

A tale of 151 sanitizers…

Page 20: Fast and Precise Sanitizer Analysis with  Bek

20

PHP Trunk Changes to html.c, 1999—2011

Page 21: Fast and Precise Sanitizer Analysis with  Bek

21

PHP Trunk Changes to html.c, 1999—2011

R7,841April 1999135 loc

R309,482March 20111693 loc

Page 22: Fast and Precise Sanitizer Analysis with  Bek

22

PHP Trunk Changes to html.c, 1999—2011

R32,564September 2000ENT_QUOTES introduced

R7,841April 1999135 loc

R309,482March 20111693 loc

Page 23: Fast and Precise Sanitizer Analysis with  Bek

23

PHP Trunk Changes to html.c, 1999—2011

R32,564September 2000ENT_QUOTES introduced

R242,949September 2007

$double_encode=true

R7,841April 1999135 loc

R309,482March 20111693 loc

Page 24: Fast and Precise Sanitizer Analysis with  Bek

24

PHP Trunk Changes to html.c, 1999—2011

• Safe to apply twice?

• Safe to combine with other sanitizers?

Page 25: Fast and Precise Sanitizer Analysis with  Bek

Motivation

25

• Writing string sanitizers correctly is difficult

• There is no cheap way to identify problems with sanitizers

• ‘Correctness’ is a moving target

• What if we could say more aboutsanitizer behavior?

Page 26: Fast and Precise Sanitizer Analysis with  Bek

26

BEK Frontend: a small language

for string manipulation; similar to how sanitizers are written today

Backend: a model based on symbolic finite transducers with algorithms for analysis and code generation

Contributions

Page 27: Fast and Precise Sanitizer Analysis with  Bek

27

BEK Frontend: a small language

for string manipulation; similar to how sanitizers are written today

Backend: a model based on symbolic finite transducers with algorithms for analysis and code generation

ContributionsEvaluation Converted sanitizers from a

variety of sources

Checked properties like reversibility, idempotence, equivalence, and commutativity

Page 28: Fast and Precise Sanitizer Analysis with  Bek

28

BEK Frontend: a small language

for string manipulation; similar to how sanitizers are written today

Backend: a model based on symbolic finite transducers with algorithms for analysis and code generation

ContributionsEvaluation Converted sanitizers from a

variety of sources

Checked properties like reversibility, idempotence, equivalence, and commutativity

Page 29: Fast and Precise Sanitizer Analysis with  Bek

29

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):        b := false;        yield('\\', c);        case (c == '\\'):           b := !b;           yield(c); case (true):           b := false; yield(c); };

Bek Program

BEK: Architecture

Page 30: Fast and Precise Sanitizer Analysis with  Bek

30

Symbolic Finite Transducers

Z3

Transformation

Microsoft.Automata

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):        b := false;        yield('\\', c);        case (c == '\\'):           b := !b;           yield(c); case (true):           b := false; yield(c); };

Bek Program

BEK: Architecture

Page 31: Fast and Precise Sanitizer Analysis with  Bek

31

Symbolic Finite Transducers

Z3

Transformation Analysis Does it do the right thing?

Counterexample “\' vs. \\'”Microsoft.Automata

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):        b := false;        yield('\\', c);        case (c == '\\'):           b := !b;           yield(c); case (true):           b := false; yield(c); };

Bek Program

BEK: Architecture

Page 32: Fast and Precise Sanitizer Analysis with  Bek

32

Symbolic Finite Transducers

Z3

Transformation Analysis Does it do the right thing?

Counterexample “\' vs. \\'”Microsoft.Automata

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):        b := false;        yield('\\', c);        case (c == '\\'):           b := !b;           yield(c); case (true):           b := false; yield(c); };

Bek Program

Code Gen

C# JavaScript C

Code Gen

BEK: Architecture

Page 33: Fast and Precise Sanitizer Analysis with  Bek

33

Symbolic Finite Transducers

Z3

Transformation Analysis Does it do the right thing?

Counterexample “\' vs. \\'”Microsoft.Automata

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):        b := false;        yield('\\', c);        case (c == '\\'):           b := !b;           yield(c); case (true):           b := false; yield(c); };

Bek Program

Code Gen

C# JavaScript C

Code Gen

BEK: Architecture

Page 34: Fast and Precise Sanitizer Analysis with  Bek

34

t := iter(c in s)[b := false;] {         case (!b && c in "['\"]"):          b := false;          yield('\\', c);      case (c == '\\'):          b := !b;          yield(c); case (true):          b := false; yield(c); };

A BEK Program: Escape Quotes

Page 35: Fast and Precise Sanitizer Analysis with  Bek

35

t := iter(c in s)[b := false;] {         case (!b && c in "['\"]"):          b := false;          yield('\\', c);      case (c == '\\'):          b := !b;          yield(c); case (true):          b := false; yield(c); };

A BEK Program: Escape Quotesiterate over the characters in string s

Page 36: Fast and Precise Sanitizer Analysis with  Bek

A BEK Program: Escape Quotes

36

t := iter(c in s)[b := false;] {         case (!b && c in "['\"]"):          b := false;          yield('\\', c);      case (c == '\\'):          b := !b;          yield(c); case (true):          b := false; yield(c); };

iterate over the characters in string s

while updating one boolean variable b

Page 37: Fast and Precise Sanitizer Analysis with  Bek

37

Symbolic Finite Transducers

Z3

Transformation Analysis Does it do the right thing?

Counterexample “\' vs. \\'”Microsoft.Automata

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):        b := false;        yield('\\', c);        case (c == '\\'):           b := !b;           yield(c); case (true):           b := false; yield(c); };

Bek Program

Code Gen

C# JavaScript C

Code Gen

BEK: Architecture

Page 38: Fast and Precise Sanitizer Analysis with  Bek

38

A Symbolic Finite Transducer

Page 39: Fast and Precise Sanitizer Analysis with  Bek

39

A Symbolic Finite Transducersymbolic predicates

Page 40: Fast and Precise Sanitizer Analysis with  Bek

40

output lists

A Symbolic Finite Transducersymbolic predicates

Page 41: Fast and Precise Sanitizer Analysis with  Bek

41

Symbolic Finite Transducers

Z3

Transformation Analysis Does it do the right thing?

Counterexample “\' vs. \\'”Microsoft.Automata

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):        b := false;        yield('\\', c);        case (c == '\\'):           b := !b;           yield(c); case (true):           b := false; yield(c); };

Bek Program

Code Gen

C# JavaScript C

Code Gen

BEK: Architecture

Page 42: Fast and Precise Sanitizer Analysis with  Bek

42

Symbolic Finite Transducers

Z3

Transformation Analysis Does it do the right thing?

Counterexample “\' vs. \\'”Microsoft.Automata

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):        b := false;        yield('\\', c);        case (c == '\\'):           b := !b;           yield(c); case (true):           b := false; yield(c); };

Bek Program

Code Gen

C# JavaScript C

Code Gen

BEK: Architecture

Now what?

Page 43: Fast and Precise Sanitizer Analysis with  Bek

SFT Algorithms

43

Equivalence Checking

Page 44: Fast and Precise Sanitizer Analysis with  Bek

SFT Algorithms

44

Equivalence Checking

AntiXSS.HtmlEncode

WebUtility.HtmlEncode

Page 45: Fast and Precise Sanitizer Analysis with  Bek

SFT Algorithms

45

Join Composition

SFT A B

in outSFT A in outSFT B

Page 46: Fast and Precise Sanitizer Analysis with  Bek

SFT Algorithms

46

Join Composition

SFT A B

in outSFT A in outSFT B

JavaScriptEncode(HtmlEncode(w))

HtmlEncode(JavaScriptEncode(w))

Page 47: Fast and Precise Sanitizer Analysis with  Bek

47

Pre-Image Computation

in

SFT A

Regular Language

Regular Language

S

Page 48: Fast and Precise Sanitizer Analysis with  Bek

48

Pre-Image Computation

in

SFT A

Regular Language

Regular Language

S?

Page 49: Fast and Precise Sanitizer Analysis with  Bek

49

BEK Frontend: a small language

for string manipulation; similar to how sanitizers are written today

Backend: a model based on symbolic finite transducers with algorithms for analysis and code generation

ContributionsEvaluation Converted sanitizers from a

variety of sources

Checked properties like reversibility, idempotence, equivalence, and commutativity

Page 50: Fast and Precise Sanitizer Analysis with  Bek

50

Some Questions• What features are needed to port

existing sanitizers?

• Can we check interesting properties on real sanitizers?

• Will HtmlEnc implementations protect against XSS Cheat Sheet samples?

Page 51: Fast and Precise Sanitizer Analysis with  Bek

Language Features

51

Data:

1x OWASP esapi HTMLencode

13x Google Ctemplate AutoEscape

21x IE 8 XSS Filter

7x Synthetic

inspect

feature counts

What features are needed to port existing sanitizers?

Page 52: Fast and Precise Sanitizer Analysis with  Bek

Language Features

52

What features are needed to port existing sanitizers?

• Majority (76%) of sanitizers can be ported without extending the language

• With multi-character lookahead: 90%

Page 53: Fast and Precise Sanitizer Analysis with  Bek

53

Data• 4x MS internal

HtmlEncode

• 3x ‘for hire’ HtmlEncode based on English-language specification (C#)

Commutative?

Equivalent?

Can we check interesting properties on real sanitizers?

Page 54: Fast and Precise Sanitizer Analysis with  Bek

54

Can we check interesting properties on real sanitizers?

• Short answer: Yes!

Page 55: Fast and Precise Sanitizer Analysis with  Bek

55

• Short answer: Yes!

• EQ results take less than a minute to obtain:1 2 3 4 5 6 7

1 ✔ ✔ ✔ ✘ ✘ ✔ ✘2 ✔ ✔ ✘ ✘ ✔ ✘3 ✔ ✘ ✘ ✔ ✘4 ✔ ✘ ✘ ✘5 ✔ ✘ ✘6 ✔ ✘7 ✔

Can we check interesting properties on real sanitizers?

Page 56: Fast and Precise Sanitizer Analysis with  Bek

The Cheat Sheet

56

Will HtmlEnc protect against known XSS strings?

in

SFT A

Regular Language

Regular Language

S?

Page 57: Fast and Precise Sanitizer Analysis with  Bek

The Cheat Sheet

57

Will HtmlEnc protect against known XSS strings?• One out of seven implementations correctly

encodes all strings for use in both HTML and attribute contexts

Page 58: Fast and Precise Sanitizer Analysis with  Bek

58

• BEK is a domain-specific language for writing string sanitizers

• We model BEK programs without approximation using symbolic finite transducers, enabling e.g., equivalence checks

• We evaluate our system using real-world sanitizers from a variety of different sources

Conclusion

Page 59: Fast and Precise Sanitizer Analysis with  Bek

Thanks!

http://research.microsoft.com/en-us/projects/bek/

http://www.rise4fun.com/bek/

Page 60: Fast and Precise Sanitizer Analysis with  Bek

Demo Time

Page 61: Fast and Precise Sanitizer Analysis with  Bek

61

Randomly-generated BEK programs, parameterized

on SFT size

Commutative?

Equivalent?

Scalability: Approach

Page 62: Fast and Precise Sanitizer Analysis with  Bek

62

Commutativity Self-Equivalence

Scalability: Results

Page 63: Fast and Precise Sanitizer Analysis with  Bek

63

100 PHPprojects

scrape

9.6 millionlines of PHP

static count

usage stats for 111 distinct PHP library functions

Sanitizer use in PHP code: Approach

Page 64: Fast and Precise Sanitizer Analysis with  Bek

64

Sanitizer use in PHP code: Results