Download pdf - MELJUN CORTES Instructional manual data_structures

CSCI05 CS – INSTRUCTIONAL MANUAL

1

CSCI05 CS

(Instructional Manual)

MELJUN P. CORTES, MBA,MPA


2

DATA STRUCTURES

Data Structures – is a way of organizing data that considers not only the items

stored but also their relationship to each other.

A general understanding of data structures is essential to developing efficient

algorithms in virtually all phases of advanced data processing and computer science.

The ability to make right decisions is vital to anyone involved with computers. Such

decisions typically involve the following general issues:

- Efficiency of a program with respect to its run time. Does it perform its

task in a time allotment that does not detract from overall system

performance?

- Efficiency of a program with respect to its utilization of main memory

and secondary storage devices. Does it consume such resources in a

fashion that makes it use impractical?

Types of Data Structure

1. Linear – the elements form a sequence

e.g. array, linked lists

2. Non-linear – structured

e.g. trees, graphs

Data Types - composed of domain data elements

Data Item – single unit of information

Types of Data Item

1. Group Item – data item that can be divided into sub-items.

e.g. name (first name, middle name, last name), address (street, town,

city, country), date of birth (month, day, year), time (hour, minute, seconds),

etc.

2. Elementary Item - data item that cannot be divide into sub-items.

e.g. age, gender, etc.

Classification of Data Types

A. Elementary data types or Simple data types

- these are the basic data types

- the value of one of these components is atomic, that is it consists of a single

entity and could not be divided.

e.g. int, float, char in C.

1. Primitive data types - enumerated data types

2. Standard primitive – built-in data types

e.g. int, char, float


3

B. Structured data types – collection of complex number of information.

1. Strings – is an ordered sequence of characters that increases and

decreases dynamically.

2. Lists – ordered sequence of components, which may themselves be

lists.

3. Arrays – fixed-size, ordered collection of data elements all of the

same type.

4. Records – also called the hierarchical or structured type.

- is an ordered collection of data elements that are not

necessarily of the same type.

ARRAYS

One of the most commonly used data structures.

Components of an array

1. array name – collective name of an array.

2. index type – set of subscripts that are used to differentiate one element

from another.

3. base type – data type of array element.

Types of array

1. One-dimensional or single dimensional array - also terms a vector, this

type of array simply refers to a specific number of consecutive memory

locations.

2. Multi-dimensional array – the position of data element must be specified

by giving coordinates, typically the row and column coordinates.

Operations of Array

1. Storage – assign a value to a particular array element

2. Extraction – getting a value from an array element

ONE – DIMENSIONAL ARRAY

- also called the linear array

- it is a list of finite number of N elements of homogenous data elements that:

a) the element of the array are referenced respectively by an index, set

consisting of N consecutive numbers.

b) The elements of the array are stored respectively in successive

memory locations.

Syntax: <base_type> <array_name>[<index>];

Example: int x[4];

To get the Total number of elements of an array:


4

We use formula:

NE = UB + LB + 1

Where: NE – number of elements

UB – upper bound (highest index of an array)

LB – lower bound (lowest index of an array)

To compute for the address of an element (memory location)

Loc[k] = base + w(k-LB)

Where: k- index of a specific element

base – starting memory address

w - element size (words per memory cell/interval)

LB – lower bound (lowest index of an array)

Sample Problems

1. Consider an array grade[8], w=2, base =100. Look for the address of

grade[5] and NE

Given: NE = UB – LB + 1

UB=7 = 7 – 0 + 1

LB = 0 = 8

w = 2

base = 100 Memory Mapping

Find: Loc[5] & NE Element Address

0 100 - base

Solution: 1 102

Loc[k] = base + w(k-LB) 2 104

Loc[5] = 100 + 2(5-0) 3 106

= 100 + 2(5) 4 108

= 110 5 110

6 112

2. An array has an index of (-2..5) at the starting address of 200. It has 3 words

per memory cell, determine Loc[-1], Loc[3], NE.

Given:

LB = -2 NE = UB – LB + 1

UB = 5 = 5 – (-2) + 1

base = 200 = 7 + 1

w =3 = 8

Find: Loc[-1], Loc[3] & NE

Solution: Element Address

Loc[k] = base + w(k-LB) -2 200 -base

Loc[-1]= 200 + 3(-1 – (-2)) -1 203

= 200 + 3(1) 0 206

= 203 1 209

2 212

Loc[k] = base + w(k-LB) 3 215

Loc[3]= 200 + 3(3 – (-2)) 4 218

= 200 + 3(5) 5 220

= 215


5

3. The starting address of an array is 872. 3 is the highest index of the array. If

it has 9 elements and w=4, find which element has the location of 892.

Given: k= Loc[k] – base + LB

base = 872 w

UB= 3 k= 892 – 872 + (-5)

NE= 9 4

w=4 = 0

Find: k Memory Mapping

Element Address

Solution: -5 872 - base

Loc[k] = base + w(k-LB) -4 876

k= Loc[k] – base + LB -3 880

w -2 884

-1 888

NE = UB – LB + 1 0 892

LB = UB – NE + 1 1 896

= 3 – 9 + 1 2 900

= 4 – 9 3 904

= -5

4. The third element of an array is –4 whose location is 558. If w=5 and there

are 7 elements in the array, compute for the base and Loc[0].

Given: Loc[k] = base + w(k-LB)

Loc[-4] = 558 Loc[0] = 548 + 5(0 – (-6))

w = 5 = 548 + 30

NE = 7 = 578

LB = -6 1st element

Find: base & Loc[0] Memory Mapping

Element Address

Solution: -6 548 - base

Loc[k] = base + w(k-LB) -5 553

Base = Loc[k] – w(k-LB) -4 558

558 = base + 5(-4 – (-6)) -3 563

558 = base + 10 -2 568

base = 558 – 10 -1 573

base = 548 0 578

5. Consider an array of 12 elements. If the 1st element of the array starts at

memory address 997 and its highest index is 7 whose location is 1074, look

for the words per memory cell.

Given:

NE =12

base = 997

UB = 7

Loc[7] = 1074

Find: w


6

Solution: Memory Mapping

NE = UB – LB + 1 Element Address

LB = UB – NE + 1 -4 997

= 7 – 12 + 1 -3 1004

= 8 – 12 -2 1011

= -4 -1 1018

0 1025

Loc[k] = base + w(k-Lb) 1 1032

1074 = 997 + w(7- (-4)) 2 1039

1074 – 997 = w(11) 3 1046

1074 – 997 = w(11) 4 1053

11 11 5 1060

w = 1074 – 997 6 1067

11 7 1074

w = 7

Seatwork

1. An array has an index (-1..7) and starting address of 354. If it has 9 words per

memory cell determine NE, Loc[2], Loc[4].

2. Array num has an index of (7..2). Element –2 is located in memory address 356.

Find which element has the location 374 if w=6.

3. Consider an array whose last element is 9. If the1st element is in 795, what is

the first element’s index? NE=15. Find w if Loc[3]=891.

TWO-DIMENSIONAL ARRAY

- Collection of M x N elements such that each element in the array is

referenced by a pair of integer such as j,k called the subscript.

- Also called matrix in mathematics and tables in business applications

therefore they are considered to be matrix array

- Has elements that form a rectangular array

Syntax:

<base_type> <array_name>[<index>] [<index>];

e.g.

int x[3][4];

Illustration: 3 x 4 array in column major 3 x 4 array in row major

- row varies faster than column - column varies faster than row 1 2 3 4 1 2 3 4

1 1,1 1,2 1,3 1,4 1 1,1 2,1 3,1 4,1 2 2,1 2,2 2,3 2,4 2 1,2 2,2 3,2 4,2 3 3,1 3,2 3,3 3,4 3 1,3 3,2 3,3 4,3

To determine the Total number of elements:

NE = M * N

Where:

NE = number of elements

M = UB1 – LB1 + 1

N = UB2 – LB2 + 1


7

To compute for the address of an element (memory location)

Row major: Loc[j,k] = base + w[ N(j - LB1) + (k - LB2) ]

Column major: Loc[j,k] = base + w[ M(k - LB2) + (j - LB1) ]

Where:

j,k – index of an element in the array

Sample problems

1. Given an array with (2..5, -3..6) of integer. Base=100, w=4. Find the Loc[4,-

2] using column major.

Given: Loc[j,k] = base +w[N(j-LB1) + (k-LB2)]

UB1 = 5 Loc[4,-2] = 100 + 4[10(4-2) + (-2 – (-3))]

LB1=2 = 100 + 4[10(4) + 1]

UB2 = -3 = 100 + 4(21)

LB2 = 6 = 100 + 84

base = 100 = 184

w = 4

Find: Loc[4,-2]

Solution:

N = UB2 – LB2 + 1

= 6- (-3) + 1

= 10

2. An 25 x 4 matrix array has a base of 200 and 3 words per memory cell.

Determine the Loc[10,2] using column major.

Given: Loc[j,k] = base + w[ M(k - LB2) + (j - LB1) ]

Base = 200 Loc[10,2] = 200 + 3[25(2-1) + (10-1)]

w= 3 = 200 + 3(25+9)

M = 25 = 200 + 3(34)

N = 4 = 200 +102

= 302

Find: Loc[10,2]

Solution:

NE = M * N

= 25 * 4

= 200

Seatwork

1. An array has an index of (-3..3, -1..4). The Loc[1,4] is 420/ If w=5, find

the base, Loc[-2,0], Loc[3,3] using row major

2. The size of array X(1..7,-1..3) starts at 512. If w=9 and Loc[j,1] is 755,

find N, NE and j.


8

3. Consider an array with (-2..7, -4..1) index. If w=6 and base is 852, find

the Loc[-1,3] using column major and Loc[4,-2] using row major.

SORTING

- Refers to the operation of arranging data in some given order such as

increasing or decreasing with numeric data and alphabetically with character.

- Frequently applied to file of records.

Sorting schemes

1. Selection sort – the most intuitive of all sorting schemes.

- the basic idea of selection sort is to make number of passes with the given

element and on each pass select one to be exactly positioned.

- the simplest interchange sorting algorithm.

Example: (in ascending order)

67 33 21 84 49 50 75 - original

21 33 67 84 49 50 75 - 1st pass

21 33 64 84 49 50 75 - 2nd pass

21 33 49 84 64 50 75 - 3rd pass

21 33 49 50 64 84 75 - 4th pass

21 33 49 50 64 84 75 - 5th pass

21 33 49 50 67 75 84 - 6th pass

21 33 49 50 67 75 84 - 7th pass

- The smallest from the given series of numbers is 21. 67 and 21 will swap

positions since we are sorting in ascending order.

- Next to 21 is 33 which should appear in the same position as it is. No changes

will be made.

- 49 should follow 33 so it swaps positions with 64.

2. Exchange Sort – systematically interchange pairs of elements that are out of

order until no such pairs remain and therefore the list is sorted.

Example (in ascending order)

67 33 21 84 49 50 75 - original

33 67 21 84 49 50 75

33 21 67 84 49 50 75

33 21 67 84 49 50 75

33 21 67 49 84 50 75

33 21 67 49 50 84 75


9

33 21 67 49 50 75 84 - 1st pass

21 33 67 49 50 75 84

21 33 67 49 50 75 84

21 33 49 67 50 75 84

21 33 49 50 67 75 84

21 33 49 50 67 75 84

21 33 49 50 67 75 84 - 2nd pass

3. Insertion Sort – repeatedly insert row element into a list of already sorted

element.

Example (in ascending order)

67 33 21 84 49 50 75 - original

33 67 21 84 49 50 75 - 1st pass

21 33 67 84 49 50 75 - 2nd pass

21 33 67 84 49 50 75 - 3rd pass

21 33 49 67 84 50 75 - 4th pass

21 33 49 50 67 84 75 - 5th pass

21 33 49 50 67 75 84 - 6th pass

Seatwork

1. 8 6 2 1 9 10

2. 50 70 90 52 85 72 0


10

Searching

- locating a particular element in a data structure

Searching Techniques

1. Sequential search – easiest search technique

- beginning at the head/top of the list, you search for the desired element by

examining each element until the search is successful or the list is exhausted.

2. Binary Search – also called the sorting search

- the search algorithm for sorted lists that involves dividing the list in half and

determining, by value comparison, whether the item would be in the upper or

lower half, the process is performed repeatedly until either the item is found

or it is determined that the item is not on the list.

To perform binary search we use the formula :

index=(min + max) div 2

Where: min – lowest subscript

max – highest subscript

and apply the condition:

keyval = is the element to be searched

if list[index] < keyval

move min

else if list[index] > keyval

move max

else if list[index] = keyval

location = index

Example:

Subscripts Elements Sorted Element

1 10 10

2 52 15

3 30 17

4 48 25

5 40 30

6 25 40

7 50 48

8 17 50

9 15 52

Note: sort the elements in ascending order before searching


11

a. if keyval=25

Index1 = (1 + 9) div 2

= 10 div 2

= 5

Index2 = (1 + 5) div 2

= 6 div 2

= 3

Index3 = (3 + 5) div 2

= 8 div 2

= 4

Element (keyval=25) found in

subscript 4

Seatwork

Subscript Elements

1 47

2 7

3 36

4 25

5 41

6 72

7 63

8 93

9 11

10 15

11 3

12 66

13 52

14 85

15 98

16 59

17 74

Search the following keyvals from the list of elements

a. keyval=59

b. keyval=93

c. keyval=74

d. keyval=15

b. If keyval = 50

Index1 =(1+9) div 2

= 10 div 2

= 5

Index2 =(5 + 9) div 2

=14 div 2

= 7

Index3 =(7 + 9) div 2

= 16 div 2

= 8

Element (keyval=50) found

in subscript 8


12

Records

- a complete set of data items, each with its own data type, field name, and

size.

- Structured data types consisting of a number of components, each of a

possibly different type.

Fields – individual components of a record

Field identifier – each field has a name called field identifier, which is some

identifier chosen by the programmer when the record type is declared.

Component variable - a record variable and is written by appending a period to

the field of the record variable.

e.g.

struct studententry

{

char name[15];

int studno;

int quizzes[2];

};

main()

{

studententry stud1, stud2;

stud1.name =”John dela Cruz”;

stud1.studno=100;

stud1.quizzes[0]=90;



}

Program Measurement/Time complexity

-this refers to determining how much time a program loop is being performed in

terms of milliseconds.

1. For (initialization; condition; incrementation /decrementation)

Heading : (U – L + 1) + 1

Where: U – is the upper value in For loop

L – is the lower value or initial value in a for loop


13

Example 1:

{heading} For (x=1;x<=10;x++) = (U - L + 1) + 1 = (10 – 1 + 1) + 1 = 11 { {inside} printf (“Input no”); = (U – L + 1) = (10 – 1 + 1) = 10

{inside} scanf(“%i”,&n); = (10 – 1 + 1) = 10 {end = 1} } = 1 = 1 32 ms Example 2: {heading} For (x=1;x<=10; x++) = (U – L + 1) + 1 = (10 – 1 + 1) = 11

{inside} printf(“Input no”); = (U - L + 1) = (10 – 1 + 1) = 10 {outside} scanf(“%i”,&n); = 1 22 ms

(Note: the statement scanf(“%i”,&n); is outside the loop because there’s no { begin and } end)

Example 3: {heading} For (a=3; a<10;a++) = (U – L + 1) + 1 = (9 – 3 + 1) + 1 = 8 {inside}{heading} For(b=1;b<=5; b++) = (U – L + 1)=(9 – 3 + 1) (5-1+1)+1 = 42 { = 0 {inside}{inside} gotoxy(3,5); = (U–L+1) (U-L+1) = (9-3+1)(5-1+1) = 35 {inside}{inside} printf(“#”); = (U–L+1) (U-L+1) = (9-3+1)(5-1+1) = 35 {end =1} } = 1

121 ms Example 4: {outside} printf(“X”); = 1 = 1 {heading} For (i=18; i >= 11; i--) = (U-L+1)+1 = (18-11+1)+1 = 9

{inside} scanf(“%i”,&m); = (U-L+1) = (18-11+1) = 8 {heading} For (b=4; b<21; b++) = (U-L+1)+1 = (12-4+1)+1 = 10 {heading}{inside} For (a=2; a!=7;a++) = (7-2+1)+1 (12-4+1) = 63 {inside}{inside} gotoxy(3,6); = (7-2 + 1) (12-4+1) = 54 {inside}{inside} printf(“Hello”); = 1 =1

146 ms Example 5: {heading} For (z=1; z<=8; z++) = (U-L+1) = (8-1+1)+1 = 9 {heading}{inside} For(a=3; a<13; a++) = (12-3+1)+1(8-1+1) = 88 {begin = 0} { = 0 = 0 {inside}{inside} gotoxy(2,7); = (8-1+1) (12-3+1) = 80 {inside}{inside} printf(a); = (8-1+1) (12-3+1) = 80

{heading}{inside}{inside} For(y=2; y<=10; y++) (12-3+1)(8-1+1)(10-2+1)+1 = 800 {inside}{inside}{inside} printf(“B”); = (10-2+1)(12-3+1)(8-1+1) = 720

{inside}{inside}{inside} printf(“T”); = (12-3+1)(8-1+1) = 80 {end=1} } = 1 = 1

1858 ms


14

Seatwork

1. gotoxy(4,5);

printf(“CSCI04”);

For (i=4; i<=20;i++)

For (j=4; i<=12;i++)

For (h=3; i<7;i++)

gotoxy(5,4); printf(h);

2. For(x=6; x<11; x++)

{

For(y=8; y<=15; y++)

{

scanf(“%d”,&z);

printf(w);

For(a=13; a>4; a--)

For (b=3; b<=7; b++)

gotoxy(12,6);

scanf(“%d”,&c);

}

gotoxy(6,7);

printf(“c”);

}

Data Abstraction

- The idea of data abstraction, in which the definition of data structure is

separated from its implementation is an important concept that is a natural

part of a top-down approach in software development. It makes it possible to

study and use structure without being concerned about the details of

implementation.

Abstract data types

- The term abstract data structure and abstract data type are often used

interchangeably.

- But the term data structure is more appropriate when data is being studied at

a logical or conceptual level without unaffected by any programming

considerations. While when we study the structure as object to be processed

in a program then the term data type is used.

Stacks

- called the Last-in-First-Out (or LIFO)

- both insertions and deletions occur at one end only called the top.

- Stacks are frequently used as a storage structure in everyday life. For

example: a stack of trays in a cafeteria, fish balls on a stick, stack of plates or

a ream of paper.

Basic Operation of stacks

1. push – a term used to insert an element on to the stack

2. pop - term used to delete an element from the stack


15

Representation of a stack

PUSH OPERATION

POP OPERATION

APPLICATION OF STACK

Parsing and Evaluation of Arithmetic Expressions using stacks

Consider the following set of assignment statements:

Z = A * B / C + D

Z = (A * B) /C + D

Z = ((A * B) / C) + D

T

O

N

Y

O

N

Y

O

N

Y

T

O

N

Y

TOP

Overflow – if stack is full

Underflow – if no item to be deleted result

to underflow

T

T


16

These assignment statements should all result in the same order of arithmetic

operation even thought the expression involved are written in distinctly form. The

process of collapsing such different expressions into one unique form is called

parsing the expression, and one frequently used method of parsing relies heavily

upon stacks.

Prefix, Postfix and Infix notation

The usual algebraic notation is often termed infix notation: the arithmetic operator

appears between the two operands to which it is being applied.

Infix notation may require parentheses to specify a desire order of operations. For

example, in the expression A/B + C, the division will occur first. If we want the

addition to occur first, the expression must be parenthesized as A/(B+C).

Using postfix notation, the need for parentheses is eliminated because the operator

is placed directly after the two operands to which it applies. Hence, A/B + C would be

written as AB/C in postfix form. This says:

1. Apply the division operator to A and B

2. To that result, add C

The infix expression A/(B + C) would be written as ABC+/ in postfix notation.

In similar way, an expression can be converted into prefix form, in which an

operator immediately precedes its two operands. The conversion algorithm for infix

to prefix specifies that, after completely parenthesizing the infix expression with

order of priority in mind, we move each operator to its corresponding left

parenthesis. Applying the method to:

E.g. A/(B + C)

gives us /+ABC

Example

A ^ 2 * B – C / D

Prefix = {^A2} * B – C/D

= {^A2} * B – {/CD}

= {* ^ A 2 B} – {/CD}

= - * ^ A 2 B / C D

Postfix = {A2^} * B – C/D

= {A2 ^ B *} – {CD/}

= A2 ^ B * CD/ -


17

Queues

- called the first-in-first-out (sometimes referred to as FIFO) list.

- Insertions are limited are limited to one end of the list, whereas deletions

may occur only at the other end.

- The ends of a queue are called rear and front.

Example: cars forming a long line at a busy toll booth, a queue in a movie

ticket booth.

Basic operation of Queues

1. addQ/insertQ - a term used to insert an item

2. removeQ/deleteQ – term used to delete an item

Representation of a queue

AddQ/InsertQ OPERATION

T

O

N

Y

O

N

Y

FRONT

REAR

Overflow – if queue is full

Underflow – if no item to be deleted result

to underflow

Front - place where deletion takes effect.

Rear – place where insertion takes effect.

T


18

RemoveQ/DeleteQ

Trees

Basic tree concepts

Tree – consists of a finite set of elements, called nodes, and an finite set of directed

lines, called branches, that connect the nodes. The number of branches associated

with a node is the degree of the node. When a branch is directed towards the node,

it is an indegree branch; when the branch is directed away from the node it is an

outdegree branch. The sum of the indegree and outdegree branches equals the

degree of the node.

In addition to root, there are many different terms used to describe the attributes of

a tree.

leaf - is any node with an out degree of zero.

internal nodes - nodes that are not a root or a leaf.

- they are found in the middle portion of a tree

parent – a node is a parent if it has a successor nodes; that is, it has an outdegree

greater than zero.

child – a node with a predecessor

siblings – a node with the same parents

ancestor – any node in the path from the root to the node

descendent - any node in the path below the parent node; that is, all nodes in the

paths from a given node to a leaf are descendents of the node.

path - a sequence of nodes in which each node is adjacent to the next one. Every

node in the tree can be reached by following a unique path starting from the root.

level – a node is its distance from the root.

height –is the level of the leaf in the longest path from the root plus one. By

definition, the height of an empty tree is –1.

Level 0

Level 1

Level 2

Parents : A,B,F Children: B,E,F,C,D,G,H,I Leaves : C,D,E,G,H,U,I,B,F Siblings : {B,E,F}, {C,D}, {G,H,I} Internal nodes : B,F

T

O

N

Y

T

O

N

Y

B E

C D G I H

A

F


19

Binary Tree

- is a tree in which no node can have more than two subtrees. In other words,

a node can have zero, one or two subtrees.

These substrees are designated as the left subtree and right subtree. Left Subtree Right Subtree

One interesting series of binary tree applications are expression trees. An

expression is a sequence of tokens that follow prescribed rule. A token may be

either an operand or an operator.

Expression tree – is a binary tree with the following properties:

1. each leaf is an operand

2. the root and internal nodes are operators

3. subtrees are subexpressions with the root being an operator.

Given an expression tree, the three standard traversals represent the three different

expression formats: infix, postfix and prefix. The inorder traversal produces the infix

expression; the postorder traversal produces the postfix expression; and the

preorder traversal produces the prefix expression.

Example: A2B - C (an expression in mathematics)

D

A ^ 2 * B – C / D (infix notation of the above expression)

BINARY TREE OF A^2 * B – C/D

B

C D F

E

A

-

/

C D

*

B ^

2 A


20

In getting the prefix notation from a binary tree follow the order Root, Left, Right

Prefix notation = - * ^ A 2 B / C D

In getting the prefix notation from a binary tree follow the order Left, Right, Root

POSTFIX NOTATION = A 2 ^ B * C D / -

Seatwork

1. x3y4

z w + v

2. b2e + (r – a3)2

d

B

-

/

C D

*

B ^

2 A

Root

Left

Right

-

/

C D

*

B ^

2 A

4

3