Fast and Precise Sanitizer Analysis with Bek

Preview:

DESCRIPTION

Fast and Precise Sanitizer Analysis with Bek. Pieter Hooimeijer Ben Livshits David Molnar Prateek Saxena Margus Veanes. 2011-08-10 USENIX Security. < img src =' some untrusted input '/>. < img src =' some untrusted input '/>. Question: What could possibly go wrong?. - PowerPoint PPT Presentation

Citation preview

Fast and Precise Sanitizer Analysis with BEK

Pieter Hooimeijer Ben Livshits David Molnar Prateek Saxena Margus Veanes

2011-08-10 USENIX Security

3

4

<img src='some untrusted input'/>

5

QUESTION:

What could possibly go wrong?

<img src='some untrusted input'/>

6

<img src='some untrusted input'/>

Attacker: im.png' onload='javascript:...

7

<img src='some untrusted input'/>

Attacker: im.png' onload='javascript:...

8

<img src='some untrusted input'/>

Attacker: im.png' onload='javascript:...

Result:<img src='im.png' onload='javascri

9

<img src='some untrusted input'/>

Attacker: im.png' onload='javascript:...

Result:<img src='im.png' onload='javascriFAIL

10

11

A tale of two sanitizers…

12

' &#39;single quote html entity

13

some untrusted input

14

Library AName:Around for:Availability:

HtmlEncodeYearsReadily available to C# developers

some untrusted input

15

Library AName:Around for:Availability:

Library BName:Around for:Availability:

HtmlEncodeYearsReadily available to C# developers

HtmlEncodeYearsReadily available to C# developers

some untrusted input

16

Library AName:Around for:Availability:

Library BName:Around for:Availability:

HtmlEncodeYearsReadily available to C# developers

HtmlEncodeYearsReadily available to C# developers

' &#39; ' ' ✔ ✘

17

public static string HtmlEncode(string s){ if (s == null) return null; int num = IndexOfHtmlEncodingChars(s, 0); if (num == -1) return s; StringBuilder builder=new StringBuilder(s.Length+5); int length = s.Length; int startIndex = 0;Label_002A: if (num > startIndex) { builder.Append(s, startIndex, num-startIndex); } char ch = s[num]; if (ch > '>') { builder.Append("&#"); builder.Append(((int) ch).ToString(NumberFormatInfo.InvariantInfo)); builder.Append(';'); } else { char ch2 = ch; if (ch2 != '"') { switch (ch2) { case '<': builder.Append("&lt;"); goto Label_00D5; case '=': goto Label_00D5; case '>': builder.Append("&gt;"); goto Label_00D5; case '&': builder.Append("&amp;"); goto Label_00D5; } } else { builder.Append("&quot;"); } }Label_00D5: startIndex = num + 1; if (startIndex < length) { num = IndexOfHtmlEncodingChars(s, startIndex); if (num != -1) { goto Label_002A; } builder.Append(s, startIndex, length-startIndex); } return builder.ToString();}

.NET WebUtilityMS AntiXSS private static string HtmlEncode(string input, bool useNamedEntities, MethodSpecificEncoder encoderTweak) { if (string.IsNullOrEmpty(input)) { return input; } if (characterValues == null) { InitialiseSafeList(); } if (useNamedEntities && namedEntities == null) { InitialiseNamedEntityList(); } // Setup a new character array for output. char[] inputAsArray = input.ToCharArray(); int outputLength = 0; int inputLength = inputAsArray.Length; char[] encodedInput = new char[inputLength * 10]; SyncLock.EnterReadLock(); try { for (int i = 0; i < inputLength; i++) { char currentCharacter = inputAsArray[i]; int currentCodePoint = inputAsArray[i]; char[] tweekedValue; // Check for invalid values if (currentCodePoint == 0xFFFE || currentCodePoint == 0xFFFF) { throw new InvalidUnicodeValueException(currentCodePoint); } else if (char.IsHighSurrogate(currentCharacter)) { if (i + 1 == inputLength) { throw new InvalidSurrogatePairException(currentCharacter, '\0'); } // Now peak ahead and check if the following character is a low surrogate. char nextCharacter = inputAsArray[i + 1]; char nextCodePoint = inputAsArray[i + 1]; if (!char.IsLowSurrogate(nextCharacter)) { throw new InvalidSurrogatePairException(currentCharacter, nextCharacter); } // Look-ahead was good, so skip. i++; // Calculate the combined code point long combinedCodePoint = 0x10000 + ((currentCodePoint - 0xD800) * 0x400) + (nextCodePoint - 0xDC00); char[] encodedCharacter = SafeList.HashThenValueGenerator(combinedCodePoint); encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else if (char.IsLowSurrogate(currentCharacter)) { throw new InvalidSurrogatePairException('\0', currentCharacter); } else if (encoderTweak != null && encoderTweak(currentCharacter, out tweekedValue)) { for (int j = 0; j < tweekedValue.Length; j++) { encodedInput[outputLength++] = tweekedValue[j]; } } else if (useNamedEntities && namedEntities[currentCodePoint] != null) { char[] encodedCharacter = namedEntities[currentCodePoint]; encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else if (characterValues[currentCodePoint] != null) { // character needs to be encoded char[] encodedCharacter = characterValues[currentCodePoint]; encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else { // character does not need encoding encodedInput[outputLength++] = currentCharacter; } } } finally { SyncLock.ExitReadLock(); } return new string(encodedInput, 0, outputLength); }

private static string HtmlEncode(string input, bool useNamedEntities, MethodSpecificEncoder encoderTweak) { if (string.IsNullOrEmpty(input)) { return input; } if (characterValues == null) { InitialiseSafeList(); } if (useNamedEntities && namedEntities == null) { InitialiseNamedEntityList(); } // Setup a new character array for output. char[] inputAsArray = input.ToCharArray(); int outputLength = 0; int inputLength = inputAsArray.Length; char[] encodedInput = new char[inputLength * 10]; SyncLock.EnterReadLock(); try { for (int i = 0; i < inputLength; i++) { char currentCharacter = inputAsArray[i]; int currentCodePoint = inputAsArray[i]; char[] tweekedValue; // Check for invalid values if (currentCodePoint == 0xFFFE || currentCodePoint == 0xFFFF) { throw new InvalidUnicodeValueException(currentCodePoint); } else if (char.IsHighSurrogate(currentCharacter)) { if (i + 1 == inputLength) { throw new InvalidSurrogatePairException(currentCharacter, '\0'); } // Now peak ahead and check if the following character is a low surrogate. char nextCharacter = inputAsArray[i + 1]; char nextCodePoint = inputAsArray[i + 1]; if (!char.IsLowSurrogate(nextCharacter)) { throw new InvalidSurrogatePairException(currentCharacter, nextCharacter); } // Look-ahead was good, so skip. i++; // Calculate the combined code point long combinedCodePoint = 0x10000 + ((currentCodePoint - 0xD800) * 0x400) + (nextCodePoint - 0xDC00); char[] encodedCharacter = SafeList.HashThenValueGenerator(combinedCodePoint); encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else if (char.IsLowSurrogate(currentCharacter)) { throw new InvalidSurrogatePairException('\0', currentCharacter); } else if (encoderTweak != null && encoderTweak(currentCharacter, out tweekedValue)) { for (int j = 0; j < tweekedValue.Length; j++) { encodedInput[outputLength++] = tweekedValue[j]; } } else if (useNamedEntities && namedEntities[currentCodePoint] != null) { char[] encodedCharacter = namedEntities[currentCodePoint]; encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else if (characterValues[currentCodePoint] != null) { // character needs to be encoded char[] encodedCharacter = characterValues[currentCodePoint]; encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else { // character does not need encoding encodedInput[outputLength++] = currentCharacter; } } } finally { SyncLock.ExitReadLock(); } return new string(encodedInput, 0, outputLength); }

public static string HtmlEncode(string s){ if (s == null) return null; int num = IndexOfHtmlEncodingChars(s, 0); if (num == -1) return s; StringBuilder builder=new StringBuilder(s.Length+5); int length = s.Length; int startIndex = 0;Label_002A: if (num > startIndex) { builder.Append(s, startIndex, num-startIndex); } char ch = s[num]; if (ch > '>') { builder.Append("&#"); builder.Append(((int) ch).ToString(NumberFormatInfo.InvariantInfo)); builder.Append(';'); } else { char ch2 = ch; if (ch2 != '"') { switch (ch2) { case '<': builder.Append("&lt;"); goto Label_00D5; case '=': goto Label_00D5; case '>': builder.Append("&gt;"); goto Label_00D5; case '&': builder.Append("&amp;"); goto Label_00D5; } } else { builder.Append("&quot;"); } }Label_00D5: startIndex = num + 1; if (startIndex < length) { num = IndexOfHtmlEncodingChars(s, startIndex); if (num != -1) { goto Label_002A; } builder.Append(s, startIndex, length-startIndex); } return builder.ToString();}

18

.NET WebUtilityMS AntiXSS

• Same behavior on all inputs?• If not, what is a

differentiating input?• Can it generate any known ‘bad’ outputs?

19

A tale of 151 sanitizers…

20

PHP Trunk Changes to html.c, 1999—2011

21

PHP Trunk Changes to html.c, 1999—2011

R7,841April 1999135 loc

R309,482March 20111693 loc

22

PHP Trunk Changes to html.c, 1999—2011

R32,564September 2000ENT_QUOTES introduced

R7,841April 1999135 loc

R309,482March 20111693 loc

23

PHP Trunk Changes to html.c, 1999—2011

R32,564September 2000ENT_QUOTES introduced

R242,949September 2007

$double_encode=true

R7,841April 1999135 loc

R309,482March 20111693 loc

24

PHP Trunk Changes to html.c, 1999—2011

• Safe to apply twice?

• Safe to combine with other sanitizers?

Motivation

25

• Writing string sanitizers correctly is difficult

• There is no cheap way to identify problems with sanitizers

• ‘Correctness’ is a moving target

• What if we could say more aboutsanitizer behavior?

26

BEK Frontend: a small language

for string manipulation; similar to how sanitizers are written today

Backend: a model based on symbolic finite transducers with algorithms for analysis and code generation

Contributions

27

BEK Frontend: a small language

for string manipulation; similar to how sanitizers are written today

Backend: a model based on symbolic finite transducers with algorithms for analysis and code generation

ContributionsEvaluation Converted sanitizers from a

variety of sources

Checked properties like reversibility, idempotence, equivalence, and commutativity

28

BEK Frontend: a small language

for string manipulation; similar to how sanitizers are written today

Backend: a model based on symbolic finite transducers with algorithms for analysis and code generation

ContributionsEvaluation Converted sanitizers from a

variety of sources

Checked properties like reversibility, idempotence, equivalence, and commutativity

29

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):        b := false;        yield('\\', c);        case (c == '\\'):           b := !b;           yield(c); case (true):           b := false; yield(c); };

Bek Program

BEK: Architecture

30

Symbolic Finite Transducers

Z3

Transformation

Microsoft.Automata

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):        b := false;        yield('\\', c);        case (c == '\\'):           b := !b;           yield(c); case (true):           b := false; yield(c); };

Bek Program

BEK: Architecture

31

Symbolic Finite Transducers

Z3

Transformation Analysis Does it do the right thing?

Counterexample “\' vs. \\'”Microsoft.Automata

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):        b := false;        yield('\\', c);        case (c == '\\'):           b := !b;           yield(c); case (true):           b := false; yield(c); };

Bek Program

BEK: Architecture

32

Symbolic Finite Transducers

Z3

Transformation Analysis Does it do the right thing?

Counterexample “\' vs. \\'”Microsoft.Automata

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):        b := false;        yield('\\', c);        case (c == '\\'):           b := !b;           yield(c); case (true):           b := false; yield(c); };

Bek Program

Code Gen

C# JavaScript C

Code Gen

BEK: Architecture

33

Symbolic Finite Transducers

Z3

Transformation Analysis Does it do the right thing?

Counterexample “\' vs. \\'”Microsoft.Automata

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):        b := false;        yield('\\', c);        case (c == '\\'):           b := !b;           yield(c); case (true):           b := false; yield(c); };

Bek Program

Code Gen

C# JavaScript C

Code Gen

BEK: Architecture

34

t := iter(c in s)[b := false;] {         case (!b && c in "['\"]"):          b := false;          yield('\\', c);      case (c == '\\'):          b := !b;          yield(c); case (true):          b := false; yield(c); };

A BEK Program: Escape Quotes

35

t := iter(c in s)[b := false;] {         case (!b && c in "['\"]"):          b := false;          yield('\\', c);      case (c == '\\'):          b := !b;          yield(c); case (true):          b := false; yield(c); };

A BEK Program: Escape Quotesiterate over the characters in string s

A BEK Program: Escape Quotes

36

t := iter(c in s)[b := false;] {         case (!b && c in "['\"]"):          b := false;          yield('\\', c);      case (c == '\\'):          b := !b;          yield(c); case (true):          b := false; yield(c); };

iterate over the characters in string s

while updating one boolean variable b

37

Symbolic Finite Transducers

Z3

Transformation Analysis Does it do the right thing?

Counterexample “\' vs. \\'”Microsoft.Automata

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):        b := false;        yield('\\', c);        case (c == '\\'):           b := !b;           yield(c); case (true):           b := false; yield(c); };

Bek Program

Code Gen

C# JavaScript C

Code Gen

BEK: Architecture

38

A Symbolic Finite Transducer

39

A Symbolic Finite Transducersymbolic predicates

40

output lists

A Symbolic Finite Transducersymbolic predicates

41

Symbolic Finite Transducers

Z3

Transformation Analysis Does it do the right thing?

Counterexample “\' vs. \\'”Microsoft.Automata

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):        b := false;        yield('\\', c);        case (c == '\\'):           b := !b;           yield(c); case (true):           b := false; yield(c); };

Bek Program

Code Gen

C# JavaScript C

Code Gen

BEK: Architecture

42

Symbolic Finite Transducers

Z3

Transformation Analysis Does it do the right thing?

Counterexample “\' vs. \\'”Microsoft.Automata

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):        b := false;        yield('\\', c);        case (c == '\\'):           b := !b;           yield(c); case (true):           b := false; yield(c); };

Bek Program

Code Gen

C# JavaScript C

Code Gen

BEK: Architecture

Now what?

SFT Algorithms

43

Equivalence Checking

SFT Algorithms

44

Equivalence Checking

AntiXSS.HtmlEncode

WebUtility.HtmlEncode

SFT Algorithms

45

Join Composition

SFT A B

in outSFT A in outSFT B

SFT Algorithms

46

Join Composition

SFT A B

in outSFT A in outSFT B

JavaScriptEncode(HtmlEncode(w))

HtmlEncode(JavaScriptEncode(w))

47

Pre-Image Computation

in

SFT A

Regular Language

Regular Language

S

48

Pre-Image Computation

in

SFT A

Regular Language

Regular Language

S?

49

BEK Frontend: a small language

for string manipulation; similar to how sanitizers are written today

Backend: a model based on symbolic finite transducers with algorithms for analysis and code generation

ContributionsEvaluation Converted sanitizers from a

variety of sources

Checked properties like reversibility, idempotence, equivalence, and commutativity

50

Some Questions• What features are needed to port

existing sanitizers?

• Can we check interesting properties on real sanitizers?

• Will HtmlEnc implementations protect against XSS Cheat Sheet samples?

Language Features

51

Data:

1x OWASP esapi HTMLencode

13x Google Ctemplate AutoEscape

21x IE 8 XSS Filter

7x Synthetic

inspect

feature counts

What features are needed to port existing sanitizers?

Language Features

52

What features are needed to port existing sanitizers?

• Majority (76%) of sanitizers can be ported without extending the language

• With multi-character lookahead: 90%

53

Data• 4x MS internal

HtmlEncode

• 3x ‘for hire’ HtmlEncode based on English-language specification (C#)

Commutative?

Equivalent?

Can we check interesting properties on real sanitizers?

54

Can we check interesting properties on real sanitizers?

• Short answer: Yes!

55

• Short answer: Yes!

• EQ results take less than a minute to obtain:1 2 3 4 5 6 7

1 ✔ ✔ ✔ ✘ ✘ ✔ ✘2 ✔ ✔ ✘ ✘ ✔ ✘3 ✔ ✘ ✘ ✔ ✘4 ✔ ✘ ✘ ✘5 ✔ ✘ ✘6 ✔ ✘7 ✔

Can we check interesting properties on real sanitizers?

The Cheat Sheet

56

Will HtmlEnc protect against known XSS strings?

in

SFT A

Regular Language

Regular Language

S?

The Cheat Sheet

57

Will HtmlEnc protect against known XSS strings?• One out of seven implementations correctly

encodes all strings for use in both HTML and attribute contexts

58

• BEK is a domain-specific language for writing string sanitizers

• We model BEK programs without approximation using symbolic finite transducers, enabling e.g., equivalence checks

• We evaluate our system using real-world sanitizers from a variety of different sources

Conclusion

Thanks!

http://research.microsoft.com/en-us/projects/bek/

http://www.rise4fun.com/bek/

Demo Time

61

Randomly-generated BEK programs, parameterized

on SFT size

Commutative?

Equivalent?

Scalability: Approach

62

Commutativity Self-Equivalence

Scalability: Results

63

100 PHPprojects

scrape

9.6 millionlines of PHP

static count

usage stats for 111 distinct PHP library functions

Sanitizer use in PHP code: Approach

64

Sanitizer use in PHP code: Results

Recommended