49
SECURE PROGRAMMING Chapter 2 Strings

SECURE PROGRAMMING Chapter 2 Strings. Overview ● Arrays and their Problems ● Character Strings ● Common String Manipulation errors ● String Vulnerabilities

Embed Size (px)

Citation preview

SECURE PROGRAMMING

Chapter 2

Strings

Overview

● Arrays and their Problems● Character Strings● Common String Manipulation errors● String Vulnerabilities and exploits● Mitigation Strategies● String Handling Functions, the bad and the good● Runtime Protection Strategies● Some Notable Vulnerabilities● Summary

Arrays and their Problems

1) Hard to determine size.

2) Size defaults may not work.

3) Easy to index an array out of bounds.

4) Easy to write non-portable code (non-consistent handling, for example).

5) Size parameters may be wrong (see 3))

6) Array copying may overflow the array

7) Pointer arithmetic may be incorrect.

Character Strings

The problem: Many strings come from outside:• Command line arguments• Environment variables• Console or other input• Text files• Network Connections

Strings are not built-in to C/C++, though there is (some) Library support

Character Strings: String Data Type

Most people implement a string as a Null terminated array of characters; addressed by a pointer. Have all the problems of arrays magnified because most string manipulation is done through procedures.

Five Important terms for arrays:

1. Bound = size of the array.

2. Lo = Address of first element of the array

3. Hi = Address of last element of the array

4. TooFar = The address of the one-too-far element of the array = Hi + 1 = Lo + Bound

5. Target size (Tsize) = Bound

Character Strings: String Data Type

Two more terms for strings.

1. Null-terminated if there is a null character within the array.

2. Length: For null-terminated strings, the number of characters before the (first) null terminator.

Problem with determining array size (clear procedure)

Character Strings: String Data Type

More problems:

What Characters? “Execution Character Set”

-locale- setlocale() function

Basic execution character set: 26 UC/LC letters, 10 digits 29 graphic characters, space, 33 control characters including HT VT FF Bell BS CR NL, NULL, DEL

Execution character set may contain many characters, require multiple bytes to represent a character (multibyte character set); basic character set still present. Locale-specific shift states.

Character Strings: UTF-8

Can represent any character in the Unicode character set, use 1-4 bytes.

0-127, 1 Byte

o.w As many 1 bits as the total number of bytes in the sequence, followed by a 0 bit; all succeeding bytes start with 10.

Thus: If leading 0, 1 byte:

If leading 11, start of multibyte code

If leading 10, continuation of multibyte code.

(Watch out for vulnerabilities!)

Character Strings: UTF-8

Wide Strings

16 or 32 bit characters

Terminated with a null wide character.

As is the case with regular strings (with caveats!)● Pointers point to left-most character.● The length is the number of wide characters

preceding the null wide character.● The value is the sequence of code values of the

contained wide characters, in order.

String Literals

Enclosed in double quotes “

Wide string literals prefixed by L

String literal tokens are concatenated together. If any of them is prefixed by L, the string is a wide string. Example in text, page 34. Null appended, used to initialize a static array.

In C, such a string is modifiable (no 'const' modifier available) but modification is “forbidden”.

Watch for declarations of the form:

const char s[3] = “abc”; //Not Null terminated string. Use:

const char s[] = “abc”

Strings in C++

● Proliferation of string classes.● Standardized (STL) down to

● String = typedef for basic_string<char>● Wstring = typedef for basic_string<wchar_t>

● Also allows:● null-terminated byte string (NTBS)● NTMBS is an NTBS that contains a sequence

of valid multibyte characters and ends in the same shift state it starts.

Strings in C++ (2)

basic_string class template specializations are safer than NTBS, but

NTBS are required all over the place:● Literals are NTBS● Existing libraries need NTBS or NTMBS

string objects are passed by value or reference, while c-strings are passed by pointer.

Thank goodness for member function data aka c_str

Character Types

Three types:● Plain● Signed● Unsigned

May cause compiler warnings if the wrong type is used.

int

Some gotcha's:● Getc and friends return an int so that EOF is an

authentic -1.● Functions in ctype.h (cctype) like isalpha accept an

int because they might be passed the result of a getc or similar.

● In C, a character constant has type int, so that sizeof('a') is 4, not 1. In C++ a character constant has type char and its size is 1.

Wide character literals have type wchar_t and multicharacter literals have type int.

Unsigned char and wchar_t

Unsigned char: all bits handled equally; pure binary. No padding bits, no trap representation, no sign extension, etc.

wchar_t: Can be used for natural-language character data. For characters in the basic character set, it does not matter, except for type compatibility issues.

Sizing String headaches

Three important numbers:

Size = number of bytes allocated to the array (sizeof(a))

Count = number of elements in the array (maybe different from size!)

Length = Number of characters before null terminator.

Notes:

If characters are wide, size may be 2*count or 4*count. (depends on OS)

Length MUST be smaller than count.

See Program fragments in book, pages 40-41.

Common String Manipulation Errors

● Use of gets NONONONONONONONO!!!!!!!!!!● Improperly bounded string copies. Do not use:

● strcpy()● strcat()● sprintf()

● Watch out for:● Input strings● Environment strings● Parameter strings.... (see programs, pp 42-47)

Common String Manipulation Errors

● Sizing strings: ● do not use strlen for wide strings; use wcslen● Multiply result by sizeof(wchar_t)

Programs, pages 41-42● Improperly bounded string input:

● Do not use:● gets● cin of string with unbounded length● Unbounded string scanf

See programs pp 42-43 (the program on page 43 is a typical implementation of gets)

Common String Manipulation Errors

● Careless copying and concatenation of strings

Program, page 44● Watch for strcpy, strcat, memcpy, sprint, etc.

● Off-by-one errors. (see program, page 47)● Null termination errors (pp 49-49)● String truncation● If you implement them yourself, you may still be

in trouble! (page 50)

String Vulnerabilities and Exploits

● String Vulnerabilities and Exploits● Where does your data come from? Are you

sure?

Program on page 51 is bad:● Uses gets● Doesn't even check the exit status of gets

String Vulnerabilities and Exploits

String Vulnerabilities and exploits

String Vulnerabilities and Exploits

String Vulnerabilities and exploits

(see ASM code, pp 56-58)

Effect called “Stack Smashing”

Example follows (remember the code from IsPasswordOK?)

String Vulnerabilities and exploits

String Vulnerabilities and exploits

String Vulnerabilities and exploits

String Vulnerabilities and exploits

String Vulnerabilities and exploits

This exploit is called “arc injection”

String Vulnerabilities and exploits

● Code Injection:● Injection of malicious address and malicious

code● Must be acceptable as legitimate input● May not cause abnormal termination● Must result in execution of the malicious code.

● IsPasswordOK is vulnerable (page 65)● Exploit with fgets and strcpy on page 66

(unclear; obviously not tested).

String Vulnerabilities and exploits

Arc injection aka return-into-libc includes:

Branching to an existing function

System(), exec(), setuid() are favorites

Example of vulnerable code, page 70

Prevents memory-based protection schemes from working.

String Vulnerabilities and exploits

Return-Oriented Programming

“gadget” = sequence of instructions followed by return.

Turing-complete set exists for many architectures, including x86, Solaris libc and there is a compiler.

Programs use the stack; values are pushed/popped,

return addresses can be skipped for branching.

Actually similar to FORTH programming.

Mitigation Strategies

Two kinds:

Prevent buffer overflows

Detect buffer overflows and recover securely

Best to do defense in depth and apply both.

Mitigation Strategies

Preventing Buffer Overflows:

Cert recommends using a consistent plan for managing strings.

Three models:

1) Caller allocates and frees

Most likely to prevent memory leaks

2) Callee allocates, caller frees

Ensures sufficient memory is available

3) Callee allocates and frees (only available in C++)

Most secure of the three solutions

Mitigation Strategies

Mitigation strategies:

Caller allocates and frees:C <string.h> family expanded with c11 functions:

strcpy_s strcat_s strncpy_s strncat_s

See example 2.5, 2.6, pages 74,75

Mitigation Strategies

Callee allocates and frees

Biggest problems:

DOS attack by exhausting memory

Dynamic memory management errors

Example 2.7 p 77

FILE *fmemopen , *open_memstream(signature, p78) to do memory “I/O”

Example code, page 79

Dynamic allocation disallowed in safety-critical systems

Mitigation Strategies

C++ string class pp 80-83

String Handling Functions, the bad and the good

gets: replace with fgets or getchar

Examples 2.9, 2.10, pp 84-86

… or gets_s

Example 2.11, page 87

… or getline() (~= getdelim())

Example 2.12, p88

String Handling Functions, the bad and the good

Strcpy() and strcat()

Fixes:

Allocate required space dynamically

Strncpy and strncat are not recommended.

Strlcpy() and strlcat() (always null-terminate result)

strcpy_s and strcat_s (implementation, page 91)

Strdup() (dynamically allocated, requires free().

Summary, pp 92-93

String Handling Functions, the bad and the good

strncpy() and strncat() (p 93)

See strncpy_s (p 95) and strncat_s (pp 97-98)

strndup() (uses dynamic memory allocation)

Summary on p 99

String Handling Functions, the bad and the good

memcpy() and memmove(): replace by memcpy_s() and memmove_s() respectively

Watch out for strlen(). There is an strlen_s, strnlen and strnlen_s, all identical.

Runtime Protection Strategies

Detection and recovery

Provided via:

input validation

the compiler and its runtime system (e. g. array bounds checking)

Operating system

Runtime Protection StrategiesInput Validation

Input data size checking.

Object size checking (with ___builtin_object_size()) Use by turning on _FORTIFY_SOURCE=n for n ⩾ 1 (p 104, 105)

Runtime Protection StrategiesThe compiler, runtime system.

Visual Studio Compiler-Generated Runtime Checks

Turn on with flags: /RTCs turns on checks for:

Local variable overflows (including arrays)

Use of uninitialized variables

Stack pointer corruption

Can be tweaked: #pragma runtime_checks(“s”, off/restore)

Runtime Bounds Checkers:

Libsafe

Libverify

CRED

Runtime Protection StrategiesThe compiler, runtime system

Stack Canaries:

StackGuard

GCC's Stack-Smashing Protector aka ProPolice

-fstack-protector[-all] -wstack-protector

C++ .NET stack overrun detection capability /GS

recommend adding: #pragma strict_gs_check(on)

recommend adding #pragma string_gs_check(on)

Recommend compiling with /GS flag and linking with /GS compiledlibraries.

Runtime Protection StrategiesThe Operating System

Address space layout randomization

Linux (PaX project, 2000)

Windows, since Vista

MAC OS X since 2007/2011, IOS since 4.3

Nonexecutable Stacks

W^X

Data Execution Prevention (Microsoft Visual Studio)

PaX marked stack as non-executable

StackGap

Some Notable Vulnerabilities

rlogin – strcpy

Kerberos

Summary

● Arrays and their Problems● Character Strings● Common String Manipulation errors● String Vulnerabilities and exploits● Mitigation Strategies● String Handling Functions, the bad and the good● Runtime Protection Strategies● Some Notable Vulnerabilities