Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
1
ComputationComputation
5JJ70 5JJ70 ‘‘ImplementatieImplementatie van van rekenprocessenrekenprocessen’’
Dynamic memory allocation Dynamic memory allocation and Hashingand Hashing
HenkHenk CorporaalCorporaalNovember 2009November 2009
© PG/HC 2008 5JJ70 pg 2
Welcome!
• Last lecture:– Pointers– vector, array, matrix– dynamic memory allocation
• Today:– Dynamic memory allocation, recap and cont'd
–Hashing: prepare lab 7–Memory layout– Use of Constants–Memory management– And, matrix reloaded (recap)
© PG/HC 2008 5JJ70 pg 3
Recap: array and pointer indexing
void main()
{
char array[7] = “Maxima”;
char * pointer = array;
pointer[4] = ‘p’;
printf(“%s\n”, array);
}
Initialized to point to the first element
of array
Maxipa
• A pointer can by indexed.• The compiler multiplies the index with the size of the elements to obtain the address:
Addr = pointer + index * size
‘M’ ‘a’ ‘x’ ‘i’
‘m’ ‘a’ ‘\0’
pointer (4 bytes) 1245048
1245044
1245040
1245036
Memory map
array
‘p’
© PG/HC 2008 5JJ70 pg 4
Recap: 1-dimensional array layout
v
v[0] v[1] v[2] v[3] v[4] v[5] v[6] v[7] v[8] v[9]
10 * 4 = 40 bytes
• With every 1-dimensional array, the compiler reserves a block of memory:– Space for the array elements(in this case 10 int , of 4 bytes each)
– Further it reminds an alias to the first element, an int*
void main()
{
int v[10];
printf(“&v[0] %u\n”, &v[0]);
printf(“v %u\n”, v);
}
The array alias:int * array element
int
© PG/HC 2008 5JJ70 pg 5
Recap: two-dimensional array
• A two dimensional n x marray consists of:–n blocks of m elements each for the array data.(data is char in the example)
–n aliases to the above arrays(Each of type char * )
– A pointer to the array(also char * )
colors
colors[1]
colors[2]
colors[col][0]
colors[col][1]
colors[col][2]
colors[col][3]
R
e
d
colors[0]
\0
G
r
n
\0
B
l
u
\0
char * colors[3] = {“Red”,
“Grn”,
“Blu” };
The array alias:char *
String alias:char *
String element:char
© PG/HC 2008 5JJ70 pg 6
Allocating memory at run-time: malloc()
• This allocates (reserves) size bytes of memory.
• The memory is not initialized.
•malloc returns a pointer to the first byte of the allocated memory space.
• If there is no memory space left, malloc will return 0.
#include <stdlib.h>
void * malloc (size);
void * means this
returns a pointer to
memory of
undefined type
size is
an int
© PG/HC 2008 5JJ70 pg 7
Example: making a variable-size array
int * create_vector_array(int number_of_elements)
{
int j;
int * vv;
vv = (int *) mallocmalloc( number_of_elements * sizeof(int) );
if(vv == 0)
return 0; /* Houston, we have a problem */
for(j = 0; j < number_of_elements; j++) {
vv[j] = 0;
}
return vv;
}
v
v[0] v[1] v[2] v[3] v[4] v[5] v[6] v[7] v[8] v[9]
Example: create_vector_array(10);
10 * 4 = 40 bytes
I am on the stack;Now I am real and
not an alias
We are on the heap; we live longer ♥
© PG/HC 2008 5JJ70 pg 8
Example: the variable-sized matrix// allocates an n_columns by n_rows array of doubles
double ** create_matrix(int n_columns, int n_rows)
{
int col, row;
double ** mtx = (double **) mallocmalloc( n_columns * sizeof(double *) );
for(col = 0; col < n_columns; col++) {
mtx[colmtx[col]] = (double *) malloc( n_rows * sizeof(double) );
for(row = 0; row < n_rows; row++) {
mtx[col][row] = 0.0;
}
}
return mtx;
}
0.0
0.0
0.0
0.0
mtx
mtx[1]
mtx[2]
mtx[3]
mtx[4]
mtx[col][0]
mtx[col][1]
mtx[col][2]
mtx[col][3]
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
mtx[0]
Example: create_matrix(5, 4)
…
double ** my_matrix = create_matrix(5, 4);
my_matrix[0][1] = 1.415;
my_matrix[1][0] = -1.415;
...
Example usage:
© PG/HC 2008 5JJ70 pg 9
Releasing memory: free()• Memory can be returned to the memory manager. This memory will be recycled. Example, based on the malloc() examples on the previous slide:
int * v = create_vector_array (10);
… /* the array lives */
free(v); /* release the memory that name points to */
/* the pointer ‘v’ itself stays alive */
• You can only free objects that were created by malloc() .• Make sure that you’re not using the pointer to the memory after it was freed. Likewise, never call free() twice on the same pointer. To be sure that this can never happen, I always set the pointer to 0 after a free:
v = 0; /* default 'invalid' pointer value */
v
v[0] v[1] v[2] v[3] v[4] v[5] v[6] v[7] v[8] v[9]
© PG/HC 2008 5JJ70 pg 10
More tricky use of free()
• Every malloc() should have a corresponding free(). The code below will not free properly. Internally create_matrix() did 1001 calls to malloc()
double ** matrix =
create_matrix(1000, 1000);
… // do something useful with it…
/* now we don’t need it anymore:
kill it */
free(matrix);
• Even worse, after the above free() we lost the pointers to 1000 pieces of memory. They are lost forever.
• Instead, we need to write a function to free all of them:
void delete_matrix(double ** matrix, int n_columns)
{
int col;
for(col = 0; col < n_columns; col++) {
free(mtx[col]); /* free each column */
}
free(matrix); /* free matrix at the very end */
}
We need to know thenumber of columns,but not the number
of rows.
© PG/HC 2008 5JJ70 pg 11
malloc()’s Family: calloc() and realloc()
• calloc also allocates memory:
• This will allocate count * size bytes of memory. The difference with malloc() is that calloc() sets the memory to 0.
char * name = copy_string(“Mabel”); /* allocates 7 bytes */
name = realloc(name,25); /* enlarge the block to 25 bytes */
strcat(name, “ and Klaas”); /* stick “ and Klaas” behind it */
printf(“%s”, name);
void * calloc (size_t count, size_t size);
• realloc changes the size of a memory block:
void * realloc (void * block, size_t new_size);
• Behind the scenes this will allocate a new block of new_size bytes, and copy the values of the existing block in the new memory. Finally, the existing memory block will be freed, so don’t use block afterwards!
Mabel and Klaas
© PG/HC 2008 5JJ70 pg 12
Dynamic Memory Management issues• Dynamic memory allocation is difficult
– You have to know exactly what you are doing.– Free every byte that you malloced!– Memory leakage is a major problem for big programs.
• Dynamic memory allocation is error prone.– Manual casting of pointer allow easy mistakes. –free is called at a very different spot as malloc. – Memory problems are often not caught by the compiler.
• Dynamic memory allocation can be slow.– A complex memory manager works behind the scenes to recycle memory efficiently.
– Use normal stack variables if you can.
• Dynamic memory allocation causes memory overhead. – Especially on small chunks of memory.– After a while, free space on heap becomes fragmented, leaving no room for larger blocks.
© PG/HC 2008 5JJ70 pg 13
Application: Searching huge lists• Suppose we’d like to make a spelling checker. We have a dictionary of correct words, and would like to know whether a word is correct or not. We could plough through the dictionary like this:
#define WORDCOUNT 10000000
char * dictionary[WORDCOUNT]; // some function fills it
...
void check_word(char * dictionary[], char * word)
{ int j;
for(j = 0; j < WORDCOUNT; j++) {
if(strcmpstrcmp(dictionary[j], word) == 0)
break;
}
if(j < WORDCOUNT)
printf(“Word %s is correct\n”, word);
else
printf(“Word %s is not in the dictionary\n”, word);
}
• On average, this will take WORDCOUNT/2 loops to check whether the word is correct !!!
© PG/HC 2008 5JJ70 pg 14
First solution: search smaller list
• Organize by first character alphabetically• So we make 26 lists, and only need to look in one of them:
#define WORDCOUNT 10000000
char * dictionary[26][26][WORDCOUNT/26]; // some function fills it
...
void check_word(char * dictionary[], char * word)
{
int j, indexindex = word[0] - ’a’; // picks the list we need
for(j = 0; j < WORDCOUNT/26 && dictionary[indexindex][j]; j++) {
if(strcmp( dictionary[indexindex][j], word) == 0) {
printf(“Word %s is correct”, word);
return;
}
}
printf(“Word %s is not in the dictionary\n”, word);
}
• This could be 26 times as fast as the previous one!
© PG/HC 2008 5JJ70 pg 15
Issues!
•Not all lists are the same size: –not many words start with ‘q’.
•And what if we want it more than 26 times as fast??
–Answer: hashing•Rather than indexing the array on the fist character, let index be based on a hash-function.
© PG/HC 2008 5JJ70 pg 16
A hash function
• The following function assigns a number between 0 and HASHSIZE-1to a string. We can use this to index!
#define HASHSIZE 100
unsigned int hash(char * str)
{
unsigned int hashval;
for(hashval = 0; *str != '\0'; str++) {
hashval = *str + ( 31 * hashval );
}
return hashval % HASHSIZE;
}
• Examples:– hash(“a”) = 97 + (31 * 0) = 97 // 97 is the ASCII value for ‘a’
– hash(“aa”) = ( 97 + (31 * 97) ) % 100 = (3104) %100 = 4
– hash(“aaa”) = ( 97 + (31 * 3104) )%100 =(96321) %100 = 21
– hash(“aaab”)= ( 98 + (31 * 96321) )%100 = 49
© PG/HC 2008 5JJ70 pg 17
Now we can make this blazingly fast!
• Lets speed it up by a factor of 100000:
#DEFINE HASHSIZE 100000
char ** dictionary[HASHSIZE]; // some function fills it
...
void check_word(char ** dictionary[], char * word)
{
int index = hash(word);
for(int j = 0; dictionary[index][j] != 0; j++) {
if(strcmp(dictionary[index][j], word) == 0) {
printf(“Word %s is correct”, word);
return;
}
}
printf(“Word %s is not in the dictionary\n”, word);
}
© PG/HC 2008 5JJ70 pg 18
3-dimensional array
7
1
3
3
A
B
1
2
\0
..
char *
char
.. .. .. .. .. .. 0..
..
..
..
..
..
..
..
char **
.. char ***
char *** dictionary;
char ** dictionary[HASHSIZE];
char * dictionary[HASHSIZE][GROUPSIZE];
char dictionary[HASHSIZE][GROUPSIZE][STRINGSIZE];
char *** a = dictionary
char ** b = dictionary[1]
char * c = dictionary[1][5]
char d =
dictionary[1][5][2]
word
© PG/HC 2008 5JJ70 pg 19
Application:
1.1. Methodically scans the webMethodically scans the web2.2. Indexes the words on the pageIndexes the words on the page3.3. Returns the list of pages that Returns the list of pages that
contain the (set of) wordscontain the (set of) words
•• Fast search is done by farms of Fast search is done by farms of thousands of LINUX PCthousands of LINUX PC’’s s
•• Key issue: the returned list of Key issue: the returned list of pages contains loads of junkpages contains loads of junk
•• Larry Page and Sergey Larry Page and Sergey BrinsBrinsanswer: answer: pagerankpagerank algorithmalgorithm
© PG/HC 2008 5JJ70 pg 20
PageRank, Larry Page’s idea
• Ideas: – Pages that are referenced more often are more important
– Pages that are referenced from important pages are more relevant
• PR(A) = PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)
Page rank of
referring page 1
Total number of
references on
this page
© PG/HC 2008 5JJ70 pg 21
Abusing pageRank
• So:– The more references, the better
– The more relevant the referring page is, the better.
– The fewer references on those pages, the better
• Make ‘link farm’:
© PG/HC 2008 5JJ70 pg 22
Constants
• Use constants for any object that should not change.
• Purposes:– Helps avoid bugs:
• Defining it as a constant results in a compile error if we try to change it.
– Often less overhead: no memory location, faster instructions.
• Constants can be operands for computations, but (obviously) they cannot hold the result of computation.
• C and C++ have several flavors of constants
© PG/HC 2008 5JJ70 pg 23
1. Constants using #define•#define is just a straightforward string substitution.• The compiler simply doesn’t see the identifier name, only its value. • In some cases this means that no memory is allocated for the constant. The compiler can use more efficient instructions.
• Defines are a bit ‘rough’: they cannot be localized to a routine.
#define NULL 0
#define PI 3.1415
#define OS_NAME “Linux”
void main (void) {
int null = 0;
int val;
…
if( val == NULL )
printf(“OS=%s\n”,OS_NAME);
if( val == null )
printf(“OS=%s\n”,OS_NAME);
}
C compiler sees:
if ( val == 0 )
C compiler sees:
…\n”, “Linux”);
C compiler needsto compare the
values of 2 variables:this is more expensive
© PG/HC 2008 5JJ70 pg 24
2. Constants using const
void main(void) {
const int a = cube(10);
const char * msg = “Hello”;
const double b;
…
a = 10;
}
Compile errorb must be initialized
immediately!
Compile errora is const
• This is just like a normal variable. But… the compiler will not allow the variable to be changed. If you do, you get a compile error.
• Sometimes more memory efficient, because the compiler knows thatyou’re not gonna mess with the variable.
• Examples:
int cube(const int x) {
return x * x * x;
}
Function arguments can be
const as well.
Question: this programhas 2 errors; which ones?
© PG/HC 2008 5JJ70 pg 25
const function arguments
• Function arguments are local variables in the function.• By specifying them as const, we can ‘protect’ them against accidental modification.
• Again, this could result in faster code because the compiler might not need to copy the variable at the beginning of the function call.
void cube(const int x) {
return x * x * x;
}
/* compute x to the exponent y */
double my_pow(const float x, const unsigned int y)
{
float result = x;
for( ; y > 0; y--)
result *= x;
return result;
}
Compile errory is const
© PG/HC 2008 5JJ70 pg 26
More const function arguments
void myroutine(const char * line) {
line = 0; // OK
line[2] = ‘a’; // compile error
}
Putting const at thebeginning means the object that is pointed
at is const.
• There is an issue if we pass pointers as function arguments:– Is the pointer itself const?– Is the object the pointer points at const?
• Remember: in C an array v[10] is laid out as follows:
• C++ (and later versions of C) allow us to make this distinction:
void myroutine(char * const line) {
line = 0; // compile error
line[2] = ‘a’; // OK
}
Putting const here means the pointer itself
is const.
v
v[0] v[1] v[2] v[3] v[4] v[5] v[6] v[7] v[8] v[9]
© PG/HC 2008 5JJ70 pg 27
Pop quiz!
void myroutine(const char * const line) {
line = 0;
line[4] = ‘a’;
}
void mutilate(const char * line) {
line[4] = ‘p’;
}
void main(void) {
char * v = “Maxima”;
mutilate(v);
}
•Does this compile? The object the pointerpoints at is const.
Therefore this doesnot compile.
•This does not compile. What is/are the problem(s)?
Both the pointer andand the array are const.
Both lines donot compile.
© PG/HC 2008 5JJ70 pg 28
Just for fun
const char * format(const char * const str) const;
• In C++, we spend a lot of time typing in the letters ‘const’.
• Function prototypes like this are perfectly normal:
Function returns astring pointer that
is const.
The function willnot mess with the
string str that we put in it
Inside the functionthe character pointerstr will not be changed
This function doesnot change the object that thismember function
operates on.
© PG/HC 2008 5JJ70 pg 29
Physical
address
Memory management • Problem: many programs run simultaneously• MMU manages all memory accesses
Main memoryCPU
Memory Management Unit
Cache memory
Logical
address
Swap fileon
hard disk
2K block2K block2K block2K block2K block
2K block2K block2K block
Processtable
Each program thinksthat it owns all the
memory.
Physical
address
VirtualMemoryManager
Checks whether therequested address
is ‘in core’
Physical
address
Yes:
No: load 2K blockfrom swap fileon disk and waituntil its there
Yes:
No: access violationsome are protected
© PG/HC 2008 5JJ70 pg 30
Memory organization
• The operating system, together with the MMU hardware, takes care of separating the programs.
• Each program runs in its own ‘virtual’ environment, and uses logical addresses that are (often) different from the actual physical addresses.
• Within the virtual world of a program, the full 4 Gigabytes address space is available. (Less under Windows-32)
• In the von Neumann architecture, we need to manage the memory space to store the following:
– The machine code of the program– The data:
• Global variables and constants• The stack/local variables• The heap
Main memory
Program+
Data
© PG/HC 2008 5JJ70 pg 31
Memory Organization: more detail
Machine code
0x00000000
0xFFFFFFFF
Global variables
Stack
Heap
The program itself:a set of machine instructions.This is in the .exe
Before the first lineof the program is run,all global variables and constants are initialized.
The local variables in the routines. With each routine call, a new set of variablesis pushed on the stack.
Free memory
The memory that is reservedby the memory manager
If the heap and thestack collide, we’re out
of memory
Stack pointer
Fixed size
Fixed size
Variable size
Variable size
© PG/HC 2008 5JJ70 pg 32
Stack memory
#include <stdio.h>
#include <math.h>
double safe_pow(double x, double y);
void main()
{
int a;
for(a = -2; a <= 2; a++) {
int b;
for(b = 0; b < 4; b++) {
double c =
safe_pow(a, b);
printf(“%d^%d = %f\t”,
a, b, c);
}
printf(“\n”);
}
}
double safe_pow(double x, double y)
{
double help;
help = fabs(x);
help = pow(help, y);
return help;
}
aa
bb
cc
xx
yy
helphelp
main()main()
safe_pow()safe_pow()
fabs()fabs() xxpowpow
helphelp
yy
xx
yy
helphelp
safe_pow()safe_pow()
• The top of the memory stack is maintained by a ‘stack pointer’ in the CPU.
• No overhead, and no run-time administration.
• No fragmentation.
© PG/HC 2008 5JJ70 pg 33
Heap memory: Memory fragmentation
• Dynamic memory allocation is much harder than memory allocation using a stack pointer.
• After a repeated malloc and free, the heap memory could be fragmented, and not contain enough space for a big chunk of memory.
• Modern memory managers do a decent job, but behind the scenes this takes effort (run-time and memory)
Heap
© PG/HC 2008 5JJ70 pg 34
Stack and heap: what’s the difference?
• String as stack variable:
void myroutine()
{
char name[7] = “Maxima”;
char * namep = name;
/* do something */
}
void myroutine()
{
char * namep = malloc(7*sizeof(char));
strcpy(namep, “Mabel”);
/* do something */
free(namep);
}
• The good news:– fast– more memory efficient– no memory leakage
• The bad news:– Length/size fixed– Must determine compile-time
• String as heap variable:
• The bad news:– slower, fragmentation issue– some memory wasted– error-prone, memory leakage
• The good news:– Length/size flexible– Size set run-time, adapt torequirements
The string dies automatically
At the end of the routine.
The string must be freed
explicitly. If you don’t
the memory space will be
lost forever.
© PG/HC 2008 5JJ70 pg 35
Pop quiz: where do the red objects live?
#define PI 3.1415
int j = 3;
double aivd(const int a)
{
double b = PI;
char c[] = “Mabel”;
char * d = “Friso”;
static int e;
const int f;
register int g;
static const int h = 0;
char * i = (char *) malloc(6);
strcpy(i, “Klaas”);
if(a <= 1)
return 1.0;
return a * aivd(a-1);
}
void main(void) {
printf(“%lf\n”, aivd(j));
}Machine code
Globalvariables,Constants
Stack
Heap
Free memory
3.1415 (PI)
j
abc Mabel\0d
Friso\0
e
f
g
h
i
Klaas\0
Klaas\0
Mabel\0
© PG/HC 2008 5JJ70 pg 36
Pop quiz: what does this program print?#define PI 3.1415
int j = 3;
double aivd(const int a)
{
double b = PI;
char c[] = “Mabel”;
char * d = “Friso”;
static int e;
const int f;
register int g;
static const int h;
char * i = (char *) malloc(6);
strcpy(i, “Klaas”);
if(a <= 1)
return 1.0;
return a * aivd(a-1);
}
void main(void) {
printf(“%lf\n”,
aivd(j));
}
main()
aivd(3)
aivd(2)
aivd(1)
1.0
2.0
6.0
6
© PG/HC 2008 5JJ70 pg 37
Pop quiz: How many incarnations of each
#define PI 3.1415
int j = 3;
double aivd(const int a)
{
double b = PI;
char c[] = “Mabel”;
char * d = “Friso”;
static int e;
const int f;
register int g;
static const int h;
char * i = (char *) malloc(6);
strcpy(i, “Klaas”);
if(a <= 1)
return 1.0;
return a * aivd(a-1);
}
void main(void) {
printf(“%lf\n”,
aivd(j));
} Machine code
Globalvariables
Stack
Free memory
3.1415 (PI)
j
a bc Mabel\0d
Friso\0
e
f g
h
i
Klaas\0
…variable at end of the program?
a bc Mabel\0df gi
a bc Mabel\0df gi
Klaas\0
Klaas\0
HeapKlaas\0
© PG/HC 2008 5JJ70 pg 38
Pop quiz: How many incarnations of each
#define PI 3.1415
int j = 3;
double aivd(const int a)
{
double b = PI;
char c[] = “Mabel”;
char * d = “Friso”;
static int e;
const int f;
register int g;
static const int h;
char * i = (char *) malloc(6);
strcpy(i, “Klaas”);
if(a <= 1)
return 1.0;
return a * aivd(a-1);
}
void main(void) {
printf(“%lf\n”,
aivd(j));
} Machine code
Globalvariables
Stack
Free memory
3.1415 (PI)
j
a bc Mabel\0d
Friso\0
e
f g
h
i
Klaas\0
…at end of the program?
a bc Mabel\0df gi
Klaas\0
Klaas\0
HeapKlaas\0
© PG/HC 2008 5JJ70 pg 39
Pop quiz: How many incarnations of each
#define PI 3.1415
int j = 3;
double aivd(const int a)
{
double b = PI;
char c[] = “Mabel”;
char * d = “Friso”;
static int e;
const int f;
register int g;
static const int h;
char * i = (char *) malloc(6);
strcpy(i, “Klaas”);
if(a <= 1)
return 1.0;
return a * aivd(a-1);
}
void main(void) {
printf(“%lf\n”,
aivd(j));
} Machine code
Globalvariables
Stack
Free memory
3.1415 (PI)
j
a bc Mabel\0d
Friso\0
e
f g
h
i
Klaas\0
…at end of the program?
Klaas\0
Klaas\0
HeapKlaas\0
© PG/HC 2008 5JJ70 pg 40
Pop quiz: How many incarnations of each
#define PI 3.1415
int j = 3;
double aivd(const int a)
{
double b = PI;
char c[] = “Mabel”;
char * d = “Friso”;
static int e;
const int f;
register int g;
static const int h;
char * i = (char *) malloc(6);
strcpy(i, “Klaas”);
if(a <= 1)
return 1.0;
return a * aivd(a-1);
}
void main(void) {
printf(“%lf\n”,
aivd(j));
} Machine code
Globalvariables
Free memory
3.1415 (PI)
j
Friso\0
eh
Klaas\0
…at end of the program?
Klaas\0
Klaas\0
HeapKlaas\0
Stack
6