Abstract Data Type

Abstract Data Type

C and Data StructuresBaojian Hua

[email protected]

Data Types A data type consists of:

A collection of data elements (a type) A set of operations on these data elements

Data types in languages: predefined:

any language defines a group of predefined data types C e.g.: int, char, float, double, …

user-defined: allow programmers to define their own (new) data typ

es C e.g.: struct, union, …

Data Type Examples Predefined:

type: int elements: …, -2, -1, 0, 1, 2, … operations: +, -, *, /, %, …

User-defined: type: complex elements: 1+3i, -5+8i, … operations: new, add, sub, distance, …

Concrete Data Types (CDT)

An concrete data type: both concrete representations and their

operations are available Almost all C predefined types are CDT

For instance, “int” is a 32-bit double-word, and +, -, …

Knowing this can do dirty hacks See demo…

Abstract Data Types (ADT) An abstract data type:

separates data type declaration from representation

separates function declaration (prototypes) from implementation (definitions)

A language must some form of mechanism to support ADT interfaces in Java signatures in ML (roughly) header files & typedef in C

Case Study

Suppose we’d design a new data type to represent complex number c: a data type “complex” elements: 3+4i, -5-8i, … operations:

new, add, sub, distance, …

How to represent this data type in C (CDT, ADT or …)?

Complex Number// Recall the definition of a complex number c: c = x + yi, where x,y \in R, and i=sqrt(-1);

// Some typical operations:complex Complex_new (double x, double y);complex Complex_add (complex c1, complex c2); complex Complex_sub (complex c1, complex c2);complex Complex_mult (complex c1, complex c2);complex Complex_divide (complex c1, complex c2);// Next, we’d discuss several variants of rep’s:// CDT, ADT.

CDT of Complex:Interface—Types// In file “complex.h”:#ifndef COMPLEX_H#define COMPLEX_H

struct Complex_t{

double x;double y;

};typedef struct Complex_t Complex_t;

Complex_t Complex_new (double x, double y);// other function prototypes are similar …#endif

Client Code// With this interface, we can write client codes // that manipulate complex numbers. File “main.c”:#include “complex.h”

int main (){ Complex_t c1, c2, c3;

c1 = Complex_new (3.0, 4.0); c2 = Complex_new (7.0, 6.0); c3 = Complex_add (c1, c2); Complex_output (c3); return 0;}

Do we know c1, c2, c3’s concrete representation?How?

CDT Complex: Implementation// In a file “complex.c”:#include “complex.h”

Complex_t Complex_new (double x, double y){

Complex_t c = {.x = x, .y = y};

return c;}// other functions are similar. See Lab1

Problem #1int main (){ Complex_t c;

c = Complex_new (3.0, 4.0);

// Want to do this: c = c + (5+i6);// Ooooops, this is legal:c.x += 5;c.y += 6;

return 0;}

Problem #2#ifndef COMPLEX_H#define COMPLEX_H

struct Complex_t{

// change to a more fancy one? Anger “main”…double a[2];

};typedef struct Complex_t Complex_t;

Complex_t Complex_new (double x, double y);// other function prototypes are similar …

#endif

Problems with CDT? Operations are transparent.

user code have no idea of the algorithm Good!

Data representations dependence Problem #1: Client code can access data

directly kick away the interface safe?

Problem #2: make code rigid easy to change or evolve?

ADT of Complex:Interface—Types// In file “complex.h”:#ifndef COMPLEX_H#define COMPLEX_H

// note that “struct complexStruct” not giventypedef struct Complex_t *Complex_t;

Complex_t Complex_new (double x, double y);// other function prototypes are similar …

#endif

Client Code// With this interface, we can write client codes // that manipulate complex numbers. File “main.c”:#include “complex.h”

int main (){ Complex_t c1, c2, c3;

c1 = Complex_new (3.0, 4.0); c2 = Complex_new (7.0, 6.0); c3 = Complex_add (c1, c2); Complex_output (c3); return 0;}

Can we still know c1, c2, c3’s concrete representation?Why?

ADT Complex: Implementation#1—Types// In a file “complex.c”:#include “complex.h”

// We may choose to define complex type as:struct Complex_t{ double x; double y;};// which is hidden in implementation.

ADT Complex: Implementation Continued// In a file “complex.c”:#include “complex.h”

Complex_t Complex_new (double x, double y){

Complex_t c;

c = malloc (sizeof (*c));c->x = x; c->y = y; return c;

}// other functions are similar. See Lab1

ADT Summary

Yes, that’s ADT! Algorithm is hidden Data representation is hidden

client code can NOT access it thus, client code independent of the impl’

Interface and implementation Do Lab1

Polymorphism To explain polymorphism, we start with a n

ew data type “tuple” A tuple is of the form: (x, y)

xA, yB (aka: A*B) A, B may be unknown in advance and may be dif

ferent E.g:

A=int, B=int: (2, 3), (4, 6), (9, 7), …

A=char *, B=double: (“Bob”, 145.8), (“Alice”, 90.5), …

Polymorphism From the data type point of view, two

types: A, B

operations: new (x, y); // create a new tuple with x and

y equals (t1, t2); // equality testing first (t); // get the first element of t second (t); // get the second element of t …

How to represent this type in computers (using C)?

Monomorphic Version We start by studying a monomorphic tuple type cal

led “intTuple”: both the first and second components are of “int” type (2, 3), (8, 9), …

The intTuple ADT: type: intTuple elements: (2, 3), (8, 9), … Operations:

tuple new (int x, int y); int first (int t); int second (tuple t); int equals (tuple t1, tuple t2); …

“IntTuple” CDT// in a file “int-tuple.h”#ifndef INT_TUPLE_H#define INT_TUPLE_H

struct IntTuple_t{

int x;int y;

};typedef struct IntTuple_t IntTuple_t;

IntTuple_t IntTuple_new (int n1, int n2);int IntTuple_first (IntTuple_t t);…

#endif

Or the “IntTuple” ADT// in a file “int-tuple.h”#ifndef INT_TUPLE_H#define INT_TUPLE_H

typedef struct IntTuple_t *IntTuple_t;

IntTuple_t IntTuple_new (int n1, int n2);int IntTuple_first (IntTuple_t t);int IntTuple_equals (IntTuple_t t1, IntTuple_t t2);…

#endif// We only discuss “tupleEquals ()”. All others// functions left to you.

Equality Testing// in a file “int-tuple.c”

int Tuple_equals (IntTuple_t t1, IntTuple_t t2)

{

return ((t1->x == t2->x) && (t1->y==t2->y));

}

x

y

t1x

y

t2

Problems? It’s ok if we only design “IntTuple” But we if we’ll design these tuples:

(int, double), (int, char *), (double, double), …

Same code exists everywhere, no means to maintain and evolve Nightmares for programmers Remember: never duplicate code!

Polymorphism Now, we consider a polymorphic tuple type

called “tuple”: “poly”: may take various forms Every element of the type “tuple” may be of di

fferent types (2, 3.14), (“8”, ‘a’), (‘\0’, 99), …

The “tuple” ADT: type: tuple elements: (2, 3.14), (“8”, ‘a’), (‘\0’, 99), …

The Tuple ADT

What about operations? tuple new (??? x, ??? y); ??? first (tuple t); ??? second (tuple t); int equals (tuple t1, tuple t2); …

Polymorphic Type To resove this, C dedicates a special polym

orphic type “void *” “void *” is a pointer which can point to “an

y” concrete types (i.e., it’s compatible with any pointer type),

very poly… long history of practice, initially “char *” can not be used directly, use ugly cast similar to constructs in others language, such as

“Object”

The Tuple ADT

What about operations? tuple newTuple (void *x, void *y); void *first (tuple t); void *second (tuple t); int equals (tuple t1, tuple t2); …

“tuple” Interface// in a file “tuple.h”#ifndef TUPLE_H#define TUPLE_H

typedef void *poly;typedef struct Tuple_t * Tuple_t;

Tuple_t Tuple_new (poly x, poly y);poly first (Tuple_t t);poly second (Tuple_t t);int equals (Tuple_t t1, Tuple_t t2);

#endif TUPLE_H

Client Code// file “main.c”#include “tuple.h”

int main (){ int i = 8;

Tuple_t t1 = Tuple_new (&i, “hello”); return 0;}

“tuple” ADT Implementation// in a file “tuple.c”#include <stdlib.h>#include “tuple.h”

struct Tuple_t { poly x; poly y;};Tuple_t Tuple_new (poly x, poly y){ tuple t = malloc (sizeof (*t)); t->x = x; t->y = y; return t;}

x

y

t

“tuple” ADT Implementation// in a file “tuple.c”#include <stdlib.h>#include “tuple.h”

struct Tuple_t{ poly x; poly y;};

poly Tuple_first (Tuple_t t){ return t->x;}

x

y

t

Client Code#include “complex.h” // ADT version#include “tuple.h”

int main (){ int i = 8;

Tuple_t t1 = Tuple_new (&i, “hello”);

// type castint *p = (int *)Tuple_first (t1);

return 0;}

Equality Testingstruct Tuple_t{ poly x; poly y;};

// The #1 try:int Tuple_equals (Tuple_t t1, Tuple_t t2){ return ((t1->x == t2->x)

&& (t1->y == t2->y)); // Wrong!!}

x

y

t


// The #2 try:int Tuple_equals (Tuple_t t1, Tuple_t t2){ return (*(t1->x) == *(t2->x) && *(t1->y) == *(t2->y)); // Problem?}

x

y

t


// The #3 try:int Tuple_equals (Tuple_t t1, Tuple_t t2){ return (equalsXXX (t1->x, t2->x) && equalsYYY (t1->y, t2->y)); // but what are “equalsXXX” and “equalsYYY”?}

x

y

t

Function as Arguments// So in the body of “equals” function, instead // of guessing the types of t->x and t->y, we // require the callers of “equals” supply the // necessary equality testing functions. // The #4 try:typedef int (*tf)(poly, poly);

int Tuple_equals (tuple t1, tuple t2, tf eqx, tf eqy){ return (eqx (t1->x, t2->x) && eqy (t1->y, t2->y));}

Change to “tuple” Interface// in file “tuple.h”#ifndef TUPLE_H#define TUPLE_H

typedef void *poly;typedef int (*tf)(poly, poly);typedef struct Tuple_t *Tuple_t;

Tuple_t Tuple_new (poly x, poly y);poly Tuple_first (Tuple_t t);poly Tuple_second (Tuple_t t);int Tuple_equals (Tuple_t t1, Tuple_t t2, tf eqx, tf eqy);

#endif TUPLE_H

Client Code// in file “main.c”#include “tuple.h”

int main (){ int i=8, j=8, k=7, m=7; Tuple_t t1 = Tuple_new (&i, &k); Tuple_t t2 = Tuple_new (&j, &k);

Tuple_equals (t1, t2, Int_equals, Int_equals); return 0;}

Moral void* serves as polymorphic type in C

mask all pointer types (think Object type in Java) Pros:

code reuse: write once, used in arbitrary context we’d see more examples later in this course

Cons: Polymorphism doesn’t come for free

boxed data: data heap-allocated (to cope with void *) no static or runtime checking (at least in C) clumsy code

extra function pointer arguments

Function-Carrying Data

Why we can NOT make use of data, such as passed as function arguments, when it’s of type “void *”?

Better idea: Let data carry functions themselves,

instead passing function pointers such kind of data called objects

Function Pointer in Dataint Tuple_equals (Tuple_t t1, Tuple_t t2){ // note that if t1->x or t1->y has carried the // equality testing functions, then the code // could just be written as: return (t1->x->equals (t1->x, t2->x)

&& t1->y->equals (t1->y, t2->y)); } equals

……

equals_y

x

y

t1

equals

……

equals_x

Function Pointer in Data// To cope with this, we should modify other // modules. For instance, the “complex” ADT:struct Complex_t{ int (*equals) (poly, poly); double a[2];};

Complex_t Complex_new (double x, double y){ Complex_t c = malloc (sizeof (*c)); c->equals = Complex_equals;

…; return n;}

x

n

equals

y

Function Callint Tuple_equals (Tuple_t t1, Tuple_t t2){ return (t1->x->equals (t1->x, t2->x)

&& t1->y->equals (t1->y,t2->y));}

a[0]

equals

a[0]t2

a[1]a[1]

x

y

t1x

y

Client Code// in file “main.c”#include “complex.h”#include “tuple.h”

int main (){ Complex_t c1 = Complex_new (1.0, 2.0); Complex_t c2 = Complex_new (1.0, 2.0);

Tuple_t t1 = Tuple_new (c1, c2); Tuple_t t2 = Tuple_new (c1, c2); Tuple_equals (t1, t2); // dirty simple! :-P return 0;}

Object

Data elements with function pointers is the simplest form of objects object = virtual functions + private data

With such facilities, we can in principal model object oriented programming In fact, early C++ compilers compiles to C That’s partly why I don’t love object-

oriented languages

Summary

Abstract data types enable modular programming clear separation between interface

and implementation interface and implementation should

design and evolve together Polymorphism enables code reuse Object = data + function pointers

Documents

Abstract Data Type