Computer Science Interview questions

Computer Science Interview questions

A logic gate performs a logical operation on one or more logic inputs and produces a single logic output. The logic is called Boolean logic and is most commonly found in digital circuits. Logic gates are primarily implemented electronically using diodes or transistors, but can also be constructed using electromagnetic relays (relay logic), fluidic logic, pneumatic logic, optics, molecules, or even mechanical elements.

The simplest form of electronic logic is diode logic. This allows AND and OR gates to be built, but not inverters, and so is an incomplete form of logic. Further, without some kind of amplification it is not possible to have such basic logic operations cascaded as required for more complex logic functions. To build a functionally complete logic system, relays, valves (vacuum tubes), or transistors can be used. The simplest family of logic gates using bipolar transistors is called resistor-transistor logic (RTL). Unlike diode logic gates, RTL gates can be cascaded indefinitely to produce more complex logic functions. These gates were used in early integrated circuits. For higher speed, the resistors used in RTL were replaced by diodes, leading to diode-transistor logic (DTL). Transistor-transistor logic (TTL) then supplanted DTL with the observation that one transistor could do the job of two diodes even more quickly, using only half the space. In virtually every type of contemporary chip implementation of digital systems, the bipolar transistors have been replaced by complementary field-effect transistors (MOSFETs) to reduce size and power consumption still further, thereby resulting in complementary metal–oxide–semiconductor (CMOS) logic.

The AND Gate

The AND gate implements the AND function. With the gate shown to the left, both inputs must have logic 1 signals applied to them in order for the output to be a logic 1. With either input at logic 0, the output will be held to logic 0.

If your browser supports the Javascript functions required for the demonstrations built into this page, you can click the buttons to the left of the AND gate drawing to change their assigned logic values, and the drawing will change to reflect the new input states. Other demonstrations on these pages will work the same way.

There is no limit to the number of inputs that may be applied to an AND function, so there is no functional limit to the number of inputs an AND gate may have. However, for practical reasons, commercial AND gates are most commonly manufactured with 2, 3, or 4 inputs. A standard Integrated Circuit (IC) package contains 14 or 16 pins, for practical size and handling. A standard 14-pin package can contain four 2-input gates, three 3-input gates, or two 4-input gates, and still have room for two pins for power supply connections.

1

http://en.wikipedia.org/wiki/CMOS

http://en.wikipedia.org/wiki/MOSFET

http://en.wikipedia.org/wiki/Field-effect_transistor

http://en.wikipedia.org/wiki/Transistor-transistor_logic

http://en.wikipedia.org/wiki/Diode-transistor_logic

http://en.wikipedia.org/wiki/Diode-transistor_logic

http://en.wikipedia.org/wiki/Integrated_circuit

http://en.wikipedia.org/wiki/Integrated_circuit

http://en.wikipedia.org/wiki/Resistor-transistor_logic

http://en.wikipedia.org/wiki/Transistor

http://en.wikipedia.org/wiki/Thermionic_valve

http://en.wikipedia.org/wiki/Relay

http://en.wikipedia.org/wiki/Functionally_complete

http://en.wikipedia.org/wiki/Diode_logic

http://en.wikipedia.org/wiki/Analytical_engine

http://en.wikipedia.org/wiki/Molecular_logic_gate

http://en.wikipedia.org/wiki/Optics

http://en.wikipedia.org/wiki/Pneumatics#Pneumatic_logic

http://en.wikipedia.org/wiki/Fluidic_logic

http://en.wikipedia.org/wiki/Relay_logic

http://en.wikipedia.org/wiki/Relay

http://en.wikipedia.org/wiki/Transistor

http://en.wikipedia.org/wiki/Diode

http://en.wikipedia.org/wiki/Electronics

http://en.wikipedia.org/wiki/Digital_circuit

http://en.wikipedia.org/wiki/Boolean_logic

http://en.wikipedia.org/wiki/Logical_operation

The OR Gate

The OR gate is sort of the reverse of the AND gate. The OR function, like its verbal counterpart, allows the output to be true (logic 1) if any one or more of its inputs are true. Verbally, we might say, "If it is raining OR if I turn on the sprinkler, the lawn will be wet." Note that the lawn will still be wet if the sprinkler is on and it is also raining. This is correctly reflected by the basic OR function.

In symbols, the OR function is designated with a plus sign (+). In logical diagrams, the symbol to the left designates the OR gate.

As with the AND function, the OR function can have any number of inputs. However, practical commercial OR gates are mostly limited to 2, 3, and 4 inputs, as with AND gates.

The NOT Gate, or Inverter

The inverter is a little different from AND and OR gates in that it always has exactly one input as well as one output. Whatever logical state is applied to the input, the opposite state will appear at the output.

The NOT function, as it is called, is necesasary in many applications and highly useful in others. A practical verbal application might be:

The door is NOT locked = You may enter

The NOT function is denoted by a horizontal bar over the value to be inverted, as shown in the figure to the left. In some cases a single quote mark (') may also be used for this

2

purpose: 0' = 1 and 1' = 0. For greater clarity in some logical expressions, we will use the overbar most of the time.

CO

In computer science and computer engineering, computer architecture or digital computer organization is the conceptual design and fundamental operational structure of a computer system. It's a blueprint and functional description of requirements and design implementations for the various parts of a computer, focusing largely on the way by which the central processing unit (CPU) performs internally and accesses addresses in memory.

C PROGRAMMING

C is a simple programming language with few keywords and a relatively simple to understand syntax.

C is also useless. C itself has no input/output commands, doesn't have support for strings as a fundamental (atomic) data type. No useful math functions built in.

Because C is useless by itself, it requires the use of libraries. This increases the complexity of C. The issue of standard libraries is resolved through the use of ANSI libraries and other methods.

Let's give a go at a very simple program that prints out "Hello World" to standard out (usually your monitor). We'll call our little program hello.c.

#include <stdio.h>

main() { printf("Hello, world!\n"); return 0;

}

What's all this junk just to print out Hello, World? Let's see what's happening:

#include <stdio.h> - Tells the compiler to include this header file for compilation. o What is a header file? They contain prototypes and other compiler/pre-processor

directives. Prototypes are basic abstract function definitions. More on these later... o Some common header files are stdio.h, stdlib.h, unistd.h, math.h.

main() - This is a function, in particular the main block. { } - These curly braces are equivalent to stating "block begin" and "block end". These

can be used at many places, such as if and switch. printf() - Ah... the actual print statement. Thankfully we have the header file stdio.h! But

what does it do? How is it defined? return 0 - What's this? Who knows!

Much better! The KEY POINT of this whole introduction is to show you the fundamental difference between correctness and understandability. Both sample codes produce the exact same output in "Hello, world!" However, only the latter example shows better readability in the

3

http://en.wikipedia.org/wiki/Memory_address

http://en.wikipedia.org/wiki/Central_processing_unit

http://en.wikipedia.org/wiki/Central_processing_unit

http://en.wikipedia.org/wiki/Blueprint

http://en.wikipedia.org/wiki/Computer

http://en.wikipedia.org/wiki/Computer_engineering

http://en.wikipedia.org/wiki/Computer_science

code leading to code that is understandable. All codes will have bugs. If you sacrifice code readability with reduced (or no) comments and cryptic lines, the burden is shifted and magnified when your code needs to be maintained.

Document what you can. Complex data types, function calls that may not be obvious, etc. Good documentation goes a long way!

Operations

You probably are familiar with < and the > relational operators from mathematics. The same principles apply in C when you are comparing two objects. There are six possibilities in C: <, <=, >, >=, !=, and ==. The first four a self-explanatory, the != stands for "not equals to" and == is "equivalent to".

Here we can point out the difference between syntax and semantics. a = b is different from a == b. Most C compilers will allow both statements to be used in conditionals like if, but they have two completely different meanings. Make sure your assignment operators are where you want them to be and your relationals where you want relational comparisons

Logical operation s

Logical operators simulate boolean algebra in C. A sampling of Logical/Boolean Operators: &&, ||, &, |, and ^. For example, && is used to compare two objects with AND: x != 0 && y != 0 Expressions involving logical operators undergo Short-Circuit Evaluation. Take the

above example into consideration. If x != 0 evaluates to false, the whole statement is false regardless of the outcome of y != 0. This can be a good thing or a bad thing depending on the context. (See Weiss pg. 51-52).

Conditional operations

if used with the above relational and logical operators allows for conditional statements. You can start blocks of code using { and }. if's can be coupled with else keyword to handle alternative outcomes. The ? : operator can be a shorthand method for signifying (if expression) ? (evaluate if true) : (else evaluate this). For example, you can use this in a return statement or a printf statement for conciseness. Beware! This reduces the readability of the program... see Introduction. This does not in any way speed up execution time. The switch statement allows for quick if-else checking. For example, if you wanted to determine what the char x was and have different outcomes for certain values of x, you could simply switch x and run cases. Some sample code: switch ( x ) { case 'a': /* Do stuff when x is 'a' */ break; case 'b': case 'c': case 'd': /* Fallthrough technique... cases b,c,d all use this code */

4

break; default: /* Handle cases when x is not a,b,c or d. ALWAYS have a default */ /* case!!! */ break;

looping

You can loop (jumping for those assembly junkies) through your code by using special loop keywords.

These include while, for, and do while. The while loops until the expression specified is false. For example while (x < 4) will

loop while x is less than 4. The syntax for for is different. Here's an example: for (i = 0; i < n; i++, z++).

This code will loop until i is equal to n. The first argument specifies initializing conditions, the second argument is like the while expression: continue the for loop until this expression no longer evaulates to true. The third argument allows for adjustment of loop control variables or other variables. These statements can be null, e.g. for (; i < n; i++) does not specify initializing code.

do while is like a "repeat-until" in Pascal. This is useful for loops that must be executed at least once. Some sample code would be:

do { /* do stuff */ } while (statement);

Storage classes

int, char, float, double are the fundamental data types in C. Type modifiers include: short, long, unsigned, signed. Not all combinations of

types and modifiers are availble. Type qualifiers include the keywords: const and volatile. The const qualifier places

the assigned variable in the constant data area of memory which makes the particular variable unmodifiable (technically it still is though). volatile is used less frequently and tells the compiler that this value can be modified outside the control of the program.

Storage classes include: auto, extern, register, static. The auto keyword places the specified variable into the stack area of memory. This is

usually implicit in most variable declarations, e.g. int i; The extern keyword makes the specified variable access the variable of the same name

from some other file. This is very useful for sharing variables in modular programs. The register keyword suggests to the compiler to place the particular variable in the

fast register memory located directly on the CPU. Most compilers these days (like gcc) are so smart that suggesting registers could actually make your program slower.

The static keyword is useful for extending the lifetime of a particular variable. If you declare a static variable inside a function, the variable remains even after the function call is long gone (the variable is placed in the alterable area of memory). The static keyword is overloaded. It is also used to declare variables to be private to a certain file only when declared with global variables. static can also be used with functions, making those functions visible only to the file itself.

5

A string is NOT a type directly supported by C. You, therefore, cannot "assign" stuff into strings. A string is defined by ANSI as an array (or collection) of characters. We will go more in-depth with strings later...

Functions

Why should we make functions in our programs when we can just do it all under main? Weiss (pg. 77) has a very good analogy that I'll borrow :) Think for a minute about high-end stereo systems. These stereo systems do not come in an all-in-one package, but rather come in separate components: pre-amplifier, amplifier, equalizer, receiver, cd player, tape deck, and speakers. The same concept applies to programming. Your programs become modularized and much more readable if they are broken down into components.

This type of programming is known as top-down programming, because we first analyze what needs to be broken down into components. Functions allow us to create top-down modular programs.

Each function consists of a name, a return type, and a possible parameter list. This abstract definition of a function is known as it's interface. Here are some sample function interfaces:

char *strdup(char *s) int add_two_ints(int x, int y) void useless(void)

The first function header takes in a pointer to a string and outputs a char pointer. The second header takes in two integers and returns an int. The last header doesn't return anything nor take in parameters.

Some programmers like to separate returns from their function names to facilitate easier readability and searchability. This is just a matter of taste. For example:

int add_two_ints(int x, int y)

A function can return a single value to its caller in a statement using the keyword return. The return value must be the same type as the return type specified in the function's interface.

Prototypes

In the introduction, we touched on function prototypes. To recap, what are function prototypes? Function prototypes are abstract function interfaces. These function declarations have no bodies; they just have their interfaces.

Function prototypes are usually declared at the top of a C source file, or in a separate header file (see Appendix: Creating Libraries).

For example, if you wanted to grab command line parameters for your program, you would most likely use the function getopt. But since this function is not part of ANSI C, you must declare the function prototype, or you will get implicit declaration warnings when compiling with our flags. So you can simply prototype getopt(3) from the man pages:

/* This section of our program is for Function Prototypes */ int getopt(int argc, char * const argv[], const char *optstring);

6

extern char *optarg; extern int optind, opterr, optopt;

So if we declared this function prototype in our program, we would be telling the compiler explicitly what getopt returns and it's parameter list. What are those extern variables? Recall that extern creates a reference to variables across files, or in other words, it creates file global scope for those variables in that particular C source file. That way we can access these variables that getopt modifies directly. More on getopt on the next section about Input/Output.

Preprocessors

The C Preprocessor is not part of the compiler, but is a separate step in the compilation process. In simplistic terms, a C Preprocessor is just a text substitution tool. We'll refer to the C Preprocessor as the CPP.

All preprocessor lines begin with #. This listing is from Weiss pg. 104. The unconditional directives are:

o #include - Inserts a particular header from another fileo #define - Defines a preprocessor macroo #undef - Undefines a preprocessor macro

The conditional directives are:

o #ifdef - If this macro is definedo #ifndef - If this macro is not definedo #if - Test if a compile time condition is trueo #else - The alternative for #ifo #elif - #else an #if in one statemento #endif - End preprocessor conditional

Other directives include:

o # - Stringization, replaces a macro parameter with a string constanto ## - Token merge, creates a single token from two adjacent ones

Some examples of the above: #define MAX_ARRAY_LENGTH 20

Tells the CPP to replace instances of MAX_ARRAY_LENGTH with 20. Use #define for constants to increase readability. Notice the absence of the ;.

#include <stdio.h> #include "mystring.h"

7

Tells the CPP to get stdio.h from System Libraries and add the text to this file. The next line tells CPP to get mystring.h from the local directory and add the text to the file. This is a difference you must take note of.

#undef MEANING_OF_LIFE #define MEANING_OF_LIFE 42

Tells the CPP to undefine MEANING_OF_LIFE and define it for 42.

#ifndef IROCK #define IROCK "You wish!" #endif

Tells the CPP to define IROCK only if IROCK isn't defined already.

#ifdef DEBUG /* Your debugging statements here */ #endif

Tells the CPP to do the following statements if DEBUG is defined. This is useful if you pass the -DDEBUG flag to gcc. This will define DEBUG, so you can turn debugging on and off on the fly!

One of the powerful functions of the CPP is the ability to simulate functions using parameterized macros. For example, we might have some code to square a number:

int square(int x) { return x * x; }

We can instead rewrite this using a macro:

#define square(x) ((x) * (x))

A few things you should notice. First square(x) The left parentheses must "cuddle" with the macro identifier. The next thing that should catch your eye are the parenthesis surrounding the x's. These are necessary... what if we used this macro as square(1 + 1)? Imagine if the macro didn't have those parentheses? It would become ( 1 + 1 * 1 + 1 ). Instead of our desired result of 4, we would get 3. Thus the added parentheses will make the expression ( (1 + 1) * (1 + 1) ). This is a fundamental difference between macros and functions. You don't have to worry about this with functions, but you must consider this

Powhen using macros.

8

Pointers

Pointers provide an indirect method of accessing variables. The reason why some people have difficulty understanding the concept of a pointer is that they are usually introduced without some sort of analogy or easily understood example.

For our simple to understand example, let's think about a typical textbook. It will usually have a table of contents, some chapters, and an index. Suppose we have a Chemistry textbook and would like to find more information on the noble gases. What one would typically do instead of flipping through the entire text, is to consult the index in the back. The index would direct us to the page(s) on which we can read more on noble gases. Conceptually, this is how pointers work!

A pointer is simply a reference containing a memory address. In our example, the noble gas entry in the index would list page numbers for more information. This is analogous to a pointer reference containing the memory address of where the real data is actually stored!

You may be wondering, what is the point of this (no pun intended)? Why don't I just make all variables without the use of pointers? It's because sometimes you can't. What if you needed an array of ints, but didn't know the size of the array before hand? What if you needed a string, but it grew dynamically as the program ran? What if you need variables that are persistent through function use without declaring them global (remember the swap function)? They are all solved through the use of pointers. Pointers are also essential in creating larger custom data structures, such as linked lists.

So now that you understand how pointers work, let's define them a little better. o A pointer when declared is just a reference. DECLARING A POINTER DOES

NOT CREATE ANY SPACE FOR THE POINTER TO POINT TO. We will tackle this dynamic memory allocation issue later.

o As stated prior, a pointer is a reference to an area of memory. This is known as a memory address. A pointer may point to dynamically allocated memory or a variable declared within a block.

o Since a pointer contains memory addresses, the size of a pointer typically corresponds to the word size of your computer. You can think of a "word" as how much data your computer can access at once. Typical machines today are 32- or 64-bit machines. 8-bits per byte equates to 4- or 8-byte pointer sizes. More on this later.

Pointers are declared by using the * in front of the variable identifier. For example: int *ip; float *fp = NULL;

This delcares a pointer, ip, to an integer. Let's say we want ip to point to an integer. The second line delares a pointer to a float, but initializes the pointer to point to the NULL pointer. The NULL pointer points to a place in memory that cannot be accessed. NULL is useful when checking for error conditions and many functions return NULL if they fail.

int x = 5; int *ip; ip = &x;

9

We first encountered the & operator first in the I/O section. The & operator is to specify the address-of x. Thus, the pointer, ip is pointing to x by assigning the address of x. This is important. You must understand this concept.

This brings up the question, if pointers contain addresses, then how do I get the actual value of what the pointer is pointing to? This is solved through the * operator. The * dereferences the pointer to the value. So,

printf("%d %d\n", x, *ip);

would print 5 5 to the screen. There is a critical difference between a dereference and a pointer declaration: int x = 0, y = 5, *ip = &y; x = *ip;

The statement int *ip = &y; is different than x = *ip;. The first statement does not dereference, the * signifies to create a pointer to an int. The second statement uses a dereference.

Remember the swap function? We can now simulate call by reference using pointers. Here is a modified version of the swap function using pointers:

void swap(int *x, int *y) { int tmp; tmp = *x; *x = *y; *y = tmp; } int main() { int a = 2, b = 3; swap(&a, &b); return EXIT_SUCCESS; }

This snip of swapping code works. When you call swap, you must give the address-of a and b, because swap is expecting a pointer.

Why does this work? It's because you are giving the address-of the variables. This memory does not "go away" or get "popped off" after the function swap ends. The changes within swap change the values located in those memory addresses.

Arrays

A simple array of 5 ints would look like: int ia[5];

This would effectively make an area in memory (if availble) for ia, which is 5 * sizeof(int). We will discuss sizeof() in detail in Dynamic Memory Allocation. Basically sizeof() returns the size of what is being passed. On a typical 32-bit machine, sizeof(int) returns 4 bytes, so we would get a total of 20 bytes of memory for our array. How do we reference areas of memory within the array? By using the [ ] we can effectively "dereference" those areas of the array to return values. printf("%d ", ia[3]);

10

This would print the fourth element in the array to the screen. Why the fourth? This is because array elements are numbered from 0. Note: You cannot initialize an array using a variable. ANSI C does not allow this. For example: int x = 5; int ia[x];

This above example is illegal. ANSI C restricts the array intialization size to be constant. So is this legal? int ia[];

No. The array size is not known at compile time. How can we get around this? By using macros we can also make our program more readable! #define MAX_ARRAY_SIZE 5 /* .... code .... */ int ia[MAX_ARRAY_SIZE];

Now if we wanted to change the array size, all we'd have to do is change the define statement! But what if we don't know the size of our array at compile time? That's why we have Dynamic Memory Allocation. More on this later...

Can we initialize the contents of the array? Yes!

LISTS STACKS QUES

Linked lists are the most basic self-referential structures. Linked lists allow you to have a chain of structs with related data.

So how would you go about declaring a linked list? It would involve a struct and a pointer:

struct llnode { <type> data; struct llnode *next; };

The <type> signifies data of any type. This is typically a pointer to something, usually another struct. The next line is the next pointer to another llnode struct. Another more convenient way using typedef:

typedef struct list_node { <type> data; struct list_node *next; } llnode;

llnode *head = NULL;

Note that even the typedef is specified, the next pointer within the struct must still have the struct tag!

There are two ways to create the root node of the linked list. One method is to create a head pointer and the other way is to create a dummy node. It's usually easier to create a head pointer.

11

Now that we have a node declaration down, how do we add or remove from our linked list? Simple! Create functions to do additions, removals, and traversals.

o Additions: A sample Linked list addition function: o void add(llnode **head, <type> data_in) {o llnode *tmp;oo if ((tmp = malloc(sizeof(*tmp))) == NULL) {o ERR_MSG(malloc);o (void)exit(EXIT_FAILURE);o }o tmp->data = data_in;o tmp->next = *head;o *head = tmp;o }oo /* ... inside some function ... */o llnode *head = NULL;o <type> *some_data;o /* ... initialize some_data ... */oo add(&head, some_data);

What's happening here? We created a head pointer, and then sent the address-of the head pointer into the add function which is expecting a pointer to a pointer. We send in the address-of head. Inside add, a tmp pointer is allocated on the heap. The data pointer on tmp is moved to point to the data_in. The next pointer is moved to point to the head pointer (*head). Then the head pointer is moved to point to tmp. Thus we have added to the beginning of the list.

o Removals: You traverse the list, querying the next struct in the list for the target. If you get a match, set the current target next's pointer to the pointer of the next pointer of the target. Don't forget to free the node you are removing (or you'll get a memory leak)! You need to take into consideration if the target is the first node in the list. There are many ways to do this (i.e. recursively). Think about it!

o Traversals: Traversing list is simple, just query the data part of the node for pertinent information as you move from next to next. There are different methods for traversing trees (see Trees).

What about freeing the whole list? You can't just free the head pointer! You have to free the list. A sample function to free a complete list:

void freelist(llnode *head) { llnode *tmp; while (head != NULL) { free(head->data); /* Don't forget to free memory within the

list! */ tmp = head->next; free(head); head = tmp; } }

12

Now we can rest easy at night because we won't have memory leaks in our lists!

STACK

Stacks are a specific kind of linked list. They are referred to as LIFO or Last In First Out. Stacks have specific adds and removes called push and pop. Pushing nodes onto stacks is

easily done by adding to the front of the list. Popping is simply removing from the front of the list.

It would be wise to give return values when pushing and popping from stacks. For example, pop can return the struct that was popped.

QUEUSE

Queues are FIFO or First In First Out. Think of a typical (non-priority) printer queue: The first jobs submitted are printed before jobs that are submitted after them.

Queues aren't more difficult to implement than stacks. By creating a tail pointer you can keep track of both the front and the tail ends of the list.

This allows you to enqueue onto the tail of the list, and dequeue from the front of the list.

TREES

Another variation of a linked list is a tree. A simple binary tree involves having two types of "next" pointers, a left and a right pointer. You can halve your access times by splitting your data into two different paths, while keeping a uniform data structure. But trees can degrade into linked list efficiency.

There are different types of trees, some popular ones are self-balancing. AVL trees are a typical type of tree that can move nodes around so that the tree is balanced without a >1 height difference between levels.

If you want more information on trees or self-balancing trees, you can query google about this.

Algorithm design

Algorithm design is a specific method to create a mathematical process in solving problems. Applied algorithm design is algorithm engineering.

Algorithm design is identified and incorporated into many solution theories of operation research, such as dynamic programming and divide-and-conquer. Techniques for designing and implementing algorithm designs are algorithm design patterns [1] , such as template method patterns and decorator patterns, and uses of data structures, and name and sort lists. Some current day uses of algorithm design can be found in internet retrieval processes of web crawling, packet routing and caching.

Mainframe programming languages such as ALGOL (for Algorithmic language), FORTRAN, COBOL, PL/I, SAIL, and SNOBOL are computing tools to implement an "algorithm design"... but, an "algorithm design" (a/d) is not a language. An a/d can be a hand written process, eg. set

13

http://en.wikipedia.org/wiki/SAIL

http://en.wikipedia.org/wiki/COBOL

http://en.wikipedia.org/wiki/FORTRAN

http://en.wikipedia.org/wiki/ALGOL

http://en.wikipedia.org/wiki/Algorithm_design#cite_note-0

http://en.wikipedia.org/wiki/Divide_and_conquer_algorithm

http://en.wikipedia.org/wiki/Dynamic_programming

http://en.wikipedia.org/wiki/Operation_research

http://en.wikipedia.org/wiki/Operation_research

http://en.wikipedia.org/wiki/Algorithm_engineering

of equations, a series of mechanical processes done by hand, an analog piece of equipment, or a digital process and/or processor.

One of the most important aspects of algorithm design is creating an algorithm that has an efficient run time, also known as its big Oh.

Dijkstra's algorithm Kruskal's algorithm Quicksort Merge sort Depth-first search Breadth-first search Insertion sort

Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty. In this context, a computational problem is understood to be a task that is in principle amenable to being solved by a computer. Informally, a computational problem consists of problem instances and solutions to these problem instances. For example, primality testing is the problem of determining whether a given number is prime or not. The instances of this problem are natural numbers, and the solution to an instance is yes or no based on whether the number is prime or not

Complexity measures

For a precise definition of what it means to solve a problem using a given amount of time and space, a computational model such as the deterministic Turing machine is used. The time required by a deterministic Turing machine M on input x is the total number of state transitions, or steps, the machine makes before it halts and outputs the answer ("yes" or "no"). A Turing machine M is said to operate within time f(n), if the time required by M on each input of length n is at most f(n). A decision problem A can be solved in time f(n) if there exists a Turing machine operating in time f(n) that solves the problem. Since complexity theory is interested in classifying problems based on their difficulty, one defines sets of problems based on some criteria. For instance, the set of problems solvable within time f(n) on a deterministic Turing machine is then denoted by DTIME(f(n)).

Analogous definitions can be made for space requirements. Although time and space are the most well-known complexity resources, any complexity measure can be viewed as a computational resource. Complexity measures are very generally defined by the Blum complexity axioms. Other complexity measures used in complexity theory include communication complexity, circuit complexity, and decision tree complexity.

Visualization of the quicksort algorithm that has average case performance Θ(nlogn).

14

http://en.wikipedia.org/wiki/Best,_worst_and_average_case

http://en.wikipedia.org/wiki/Algorithm

http://en.wikipedia.org/wiki/Quicksort

http://en.wikipedia.org/wiki/Decision_tree_complexity

http://en.wikipedia.org/wiki/Circuit_complexity

http://en.wikipedia.org/wiki/Communication_complexity

http://en.wikipedia.org/wiki/Blum_complexity_axioms

http://en.wikipedia.org/wiki/Blum_complexity_axioms

http://en.wikipedia.org/wiki/Complexity_measure

http://en.wikipedia.org/wiki/DTIME

http://en.wikipedia.org/wiki/Deterministic_Turing_machine

http://en.wikipedia.org/wiki/Natural_numbers

http://en.wikipedia.org/wiki/Prime_number

http://en.wikipedia.org/wiki/Primality_testing

http://en.wikipedia.org/wiki/Computational_problems

http://en.wikipedia.org/wiki/Mathematics

http://en.wikipedia.org/wiki/Theoretical_computer_science


http://en.wikipedia.org/wiki/Theory_of_computation

http://en.wikipedia.org/wiki/Big_Oh

The best, worst and average case complexity refer to three different ways of measuring the time complexity (or any other complexity measure) of different inputs of the same size. Since some inputs of size n may be faster to solve than others, we define the following complexities:

Best-case complexity: This is the complexity of solving the problem for the best input of size n. Worst-case complexity: This is the complexity of solving the problem for the worst input of size

n. Average-case complexity: This is the complexity of solving the problem on an average. This

complexity is only defined with respect to a probability distribution over the inputs. For instance, if all inputs of the same size are assumed to be equally likely to appear, the average case complexity can be defined with respect to the uniform distribution over all inputs of size n.

For example, consider the deterministic sorting algorithm quicksort. This solves the problem of sorting a list of integers that is given as the input. The best-case scenario is when the input is already sorted, and the algorithm takes time O(n log n) for such inputs. The worst-case is when the input is sorted in reverse order, and the algorithm takes time O(n2) for this case. If we assume that all possible permutations of the input list are equally likely, the average time taken for sorting is O(n log n).

Upper and lower bounds on the complexity of problems

To classify the computation time (or similar resources, such as space consumption), one is interested in proving upper and lower bounds on the minimum amount of time required by the most efficient algorithm solving a given problem. The complexity of an algorithm is usually taken to be its worst-case complexity, unless specified otherwise. Analyzing a particular algorithm falls under the field of analysis of algorithms. To show an upper bound T(n) on the time complexity of a problem, one needs to show only that there is a particular algorithm with running time at most T(n). However, proving lower bounds is much more difficult, since lower bounds make a statement about all possible algorithms that solve a given problem. The phrase "all possible algorithms" includes not just the algorithms known today, but any algorithm that might be discovered in the future. To show a lower bound of T(n) for a problem requires showing that no algorithm can have time complexity lower than T(n).

Upper and lower bounds are usually stated using the big Oh notation, which hides constant factors and smaller terms. This makes the bounds independent of the specific details of the computational model used. For instance, if T(n) = 7n2 + 15n + 40, in big Oh notation one would write T(n) = O(n2).

15

http://en.wikipedia.org/wiki/Big_Oh_notation

http://en.wikipedia.org/wiki/Analysis_of_algorithms

http://en.wikipedia.org/wiki/Quicksort

http://en.wikipedia.org/wiki/Probability_distribution

http://en.wikipedia.org/wiki/Best,_worst_and_average_case

Important complexity classes

A representation of the relation among complexity classes

Many important complexity classes can be defined by bounding the time or space used by the algorithm. Some important complexity classes of decision problems defined in this manner are the following:

Complexity class Model of computation Resource constraint

DTIME(f(n)) Deterministic Turing machine Time f(n)

P Deterministic Turing machine Time poly(n)

EXPTIME Deterministic Turing machine Time 2poly(n)

NTIME(f(n)) Non-deterministic Turing machine Time f(n)

NP Non-deterministic Turing machine Time poly(n)

NEXPTIME Non-deterministic Turing machine Time 2poly(n)

DSPACE(f(n)) Deterministic Turing machine Space f(n)

L Deterministic Turing machine Space O(log n)

PSPACE Deterministic Turing machine Space poly(n)

EXPSPACE Deterministic Turing machine Space 2poly(n)

16

http://en.wikipedia.org/wiki/EXPSPACE

http://en.wikipedia.org/wiki/PSPACE

http://en.wikipedia.org/wiki/L_(complexity)

http://en.wikipedia.org/wiki/DSPACE

http://en.wikipedia.org/wiki/NEXPTIME

http://en.wikipedia.org/wiki/NP_(complexity)

http://en.wikipedia.org/wiki/NTIME

http://en.wikipedia.org/wiki/EXPTIME

http://en.wikipedia.org/wiki/P_(complexity)

http://en.wikipedia.org/wiki/DTIME

http://en.wikipedia.org/wiki/File:Complexity_subsets_pspace.svg

http://en.wikipedia.org/wiki/File:Complexity_subsets_pspace.svg

NSPACE(f(n)) Non-deterministic Turing machine Space f(n)

NL Non-deterministic Turing machine Space O(log n)

NPSPACE Non-deterministic Turing machine Space poly(n)

NEXPSPACE Non-deterministic Turing machine Space 2poly(n)

ER DIAGRAMS

In software engineering, an entity-relationship model (ERM) is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements in a top-down fashion. Diagrams created by this process are called entity-relationship diagrams, ER diagrams, or ERDs.

The definitive reference for entity-relationship modeling is Peter Chen's 1976 paper.[1] However, variants of the idea existed previously,[2] and have been devised subsequently.

The building blocks: entities, relationships, and attributes

Two related entities

An entity with an attribute

A relationship with an attribute

Primary key

An entity may be defined as a thing which is recognized as being capable of an independent existence and which can be uniquely identified. An entity is an abstraction from the complexities of some domain. When we speak of an entity we normally speak of some aspect of the real world which can be distinguished from other aspects of the real world.[3]

An entity may be a physical object such as a house or a car, an event such as a house sale or a car service, or a concept such as a customer transaction or order. Although the term entity is the one

17

http://en.wikipedia.org/wiki/Entity-relationship_model#cite_note-1

http://en.wikipedia.org/wiki/Entity-relationship_model#cite_note-0

http://en.wikipedia.org/wiki/Peter_Chen

http://en.wikipedia.org/wiki/Top-down

http://en.wikipedia.org/wiki/Relational_database

http://en.wikipedia.org/wiki/Relational_database

http://en.wikipedia.org/wiki/Semantic_data_model

http://en.wikipedia.org/wiki/Conceptual_schema

http://en.wikipedia.org/wiki/Database_model

http://en.wikipedia.org/wiki/Data

http://en.wikipedia.org/wiki/Software_engineering

http://en.wikipedia.org/wiki/NEXPSPACE

http://en.wikipedia.org/wiki/NPSPACE

http://en.wikipedia.org/wiki/NL_(complexity)

http://en.wikipedia.org/wiki/NSPACE

http://en.wikipedia.org/wiki/File:Erd-entity-relationship-example1.svg

http://en.wikipedia.org/wiki/File:Erd-entity-with-attribute.svg

http://en.wikipedia.org/wiki/File:Erd-relationship-with-attribute.png

http://en.wikipedia.org/wiki/File:Erd-id-as-primary-key.png

most commonly used, following Chen we should really distinguish between an entity and an entity-type. An entity-type is a category. An entity, strictly speaking, is an instance of a given entity-type. There are usually many instances of an entity-type. Because the term entity-type is somewhat cumbersome, most people tend to use the term entity as a synonym for this term.

Entities can be thought of as nouns. Examples: a computer, an employee, a song, a mathematical theorem.

A relationship captures how two or more entities are related to one another. Relationships can be thought of as verbs, linking two or more nouns. Examples: an owns relationship between a company and a computer, a supervises relationship between an employee and a department, a performs relationship between an artist and a song, a proved relationship between a mathematician and a theorem.

The model's linguistic aspect described above is utilized in the declarative database query language ERROL, which mimics natural language constructs.

Entities and relationships can both have attributes. Examples: an employee entity might have a Social Security Number (SSN) attribute; the proved relationship may have a date attribute.

Every entity (unless it is a weak entity) must have a minimal set of uniquely identifying attributes, which is called the entity's primary key.

Entity-relationship diagrams don't show single entities or single instances of relations. Rather, they show entity sets and relationship sets. Example: a particular song is an entity. The collection of all songs in a database is an entity set. The eaten relationship between a child and her lunch is a single relationship. The set of all such child-lunch relationships in a database is a relationship set. In other words, a relationship set corresponds to a relation in mathematics, while a relationship corresponds to a member of the relation.

Certain cardinality constraints on relationship sets may be indicated as well.

Entity sets are drawn as rectangles, relationship sets as diamonds. If an entity set participates in a relationship set, they are connected with a line.

Attributes are drawn as ovals and are connected with a line to exactly one entity or relationship set.

Cardinality constraints are expressed as follows:

a double line indicates a participation constraint, totality or surjectivity: all entities in the entity set must participate in at least one relationship in the relationship set;

an arrow from entity set to relationship set indicates a key constraint, i.e. injectivity: each entity of the entity set can participate in at most one relationship in the relationship set;

a thick line indicates both, i.e. bijectivity: each entity in the entity set is involved in exactly one relationship.

18

an underlined name of an attribute indicates that it is a key: two different entities or relationships with this attribute always have different values for this attribute.

Attributes are often omitted as they can clutter up a diagram; other diagram techniques often list entity attributes within the rectangles drawn for entity sets.

Chen's notation for entity-relationship modeling uses rectangles to represent entities, and diamonds to represent relationships appropriate for first-class objects: they can have attributes and relationships of their own.

Related diagramming convention techniques:

What is Normalization?Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

The Normal FormsThe database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen and won't be discussed in this article.

Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take place, it's extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies. That said, let's explore the normal forms.

First Normal Form (1NF)First normal form (1NF) sets the very basic rules for an organized database:

Eliminate duplicative columns from the same table. Create separate tables for each group of related data and identify each row with a unique

column or set of columns (the primary key).

Second Normal Form (2NF)Second normal form (2NF) further addresses the concept of removing duplicative data:

Meet all the requirements of the first normal form.

19

http://databases.about.com/library/glossary/bldef-primarykey.htm

http://databases.about.com/library/glossary/bldef-row.htm

http://databases.about.com/library/glossary/bldef-column.htm

http://databases.about.com/od/specificproducts/l/aa3nf.htm

http://databases.about.com/od/specificproducts/a/2nf.htm



http://databases.about.com/library/glossary/bldef-table.htm

Remove subsets of data that apply to multiple rows of a table and place them in separate tables. Create relationships between these new tables and their predecessors through the use of

foreign keys.

Third Normal Form (3NF)Third normal form (3NF) goes one large step further:

Meet all the requirements of the second normal form. Remove columns that are not dependent upon the primary key.

Fourth Normal Form (4NF)Finally, fourth normal form (4NF) has one additional requirement:

Meet all the requirements of the third normal form. A relation is in 4NF if it has no multi-valued dependencies.

Creating the DatabaseOur first step is to create the database itself. Many database management systems offer a series of options to customize database parameters at this step, but our database only permits the simple creation of a database. As with all of our commands, you may wish to consult the documentation for your DBMS to determine if any advanced parameters supported by your specific system meet your needs. Let's use the CREATE DATABASE command to set up our database:

CREATE DATABASE personnel

Why Use Views?There are two primary reasons to provide users with access to data through views rather than providing them with direct access to database tables:

Views provide simple, granular security. You can use a view to limit the data that a user is allowed to see in a table. For example, if you have an employees table and wish to provide some users with access to the records of full-time employees, you can create a view that contains only those records. This is much easier than the alternative (creating and maintaining a shadow table) and ensures the integrity of the data.

Views simplify the user experience. Views hide complex details of your database tables from end users who do not need to see them. If a user dumps the contents of a view, they won’t see the table columns that aren’t selected by the view and they might not understand. This protects them from the confusion caused by poorly named columns, unique identifiers and table keys.

Creating a ViewCreating a view is quite straightforward: you simply need to create a query that contains the restrictions you wish to enforce and place it inside the CREATE VIEW command. Here’s the syntax:

CREATE VIEW viewname AS<query>

20

http://databases.about.com/b/2009/03/12/which-key-is-which.htm

http://databases.about.com/library/glossary/bldef-foreignkey.htm

For example, if you wish to create the full-time employees view I discussed in the previous section, you would issue the following command:

CREATE VIEW fulltime ASSELECT first_name, last_name, employee_idFROM employeesWHERE status='FT'

Modifying a ViewChanging the contents of a view uses the exact same syntax as the creation of a view, but you use the ALTER VIEW command instead of the CREATE VIEW command. For example, if you wanted to add a restriction to the fulltime view that adds the employee’s telephone number to the results, you would issue the following command:

ALTER VIEW fulltime ASSELECT first_name, last_name, employee_id, telephoneFROM employeesWHERE status='FT'

Relational Database

A database based on relational algebra or relation model is called relation database

[edit] Relational data modelrelation

a table in a relational database is called relation in the mathematical language of relational algebra. relations are unordered.

attribute

column of a table in database table is called attributes. columns or attributes have names.

domain

set of permissible values for an attribute ( or column) is called domain.

tuple

a row in the database table is called tuple in the mathematical language of relational algebra. order of tuples in a relation has no significance.

database

a database is a collection of multiple relations.

21

http://en.wikibooks.org/w/index.php?title=DBMS&action=edit&section=15

schema

a database design is called schema, alternatively, a schema can refer to namespace within a database.

cardinality of a relation

number of attributes in a relation is called cardinality of the relation.

Normalization theory deals with design of relational database schema.

=== Keys ===

key

any subset of a relation is called key.

super key

a key is called super key if it is sufficient to identify a unique tuple of a relation.

candidate key

a minimal super key is called candidate key i.e no proper subset of a candidate key is super key.

primary key

a candidate key chosen as a principal to identify a unique tuple of a relation.

foreign key

a key of a relation which is a primary key of some other relation in the relational schema.

2. Referential integrity constraint or foreign key constraint: This constraint asserts that a reference in one data item indeed leads to another data item. A foreign key is a field that is a primary key in another table. Referential integrity consists of:

Not inserting a record if the value of the foreign key being inserted does not match an existing record in another table with the primary key having the same value,

Not deleting a record whose primary key is defined as a foreign key in child records and

Not modifying the value of primary keys.

22

http://en.wikipedia.org/wiki/namespace

Most DBMS enforce other types of constraints having to do with the data content of the field and usually called Check constraints. Examples are limiting the values of a field to a list of values or to a range of values, validating dates and checking the format of the data i.e., no alpha characters allowed in a numeric field, etc.

Integrity constrains

Databases depend upon keys to store, sort and compare records. If you’ve been around databases for a while, you’ve probably heard about many different types of keys – primary keys, candidate keys, and foreign keys. When you create a new database table, you’re asked to select one primary key that will uniquely identify records stored in that table.

The selection of a primary key is one of the most critical decisions you’ll make in the design of a new database. The most important constraint is that you must ensure that the selected key is unique. If it’s possible that two records (past, present, or future) may share the same value for an attribute, it’s a poor choice for a primary key. When evaluating this constraint, you should think creatively. Let’s consider a few examples that caused issues for real-world databases:

ZIP Codes do not make good primary keys for a table of towns. If you’re making a a simple lookup table of cities, ZIP code seems to be a logical primary key. However, upon further investigation, you may realize that more than one town may share a ZIP code. For example, four cities in New Jersey (Neptune, Neptune City, Tinton Falls and Wall Township) all share the ZIP code 07753.

Social Security Numbers do not make good primary keys for a table of people for many reasons. First, most people consider their SSN private and don’t want it used in databases in the first place. Second, some people don’t have SSNs – especially those who have never set foot in the United States! Third, SSNs may be reused after an individual’s death. Finally, an individual may have more than one SSN over a lifetime – the Social Security Administration will issue a new number in cases of fraud or identity theft.

PROCESS AND THREDS

Threads vs. Processes

Both threads and processes are methods of parallelizing an application. However, processes are independent execution units that contain their own state information, use their own address spaces, and only interact with each other via interprocess communication mechanisms (generally managed by the operating system). Applications are typically divided into processes during the design phase, and a master process explicitly spawns sub-processes when it makes sense to logically separate significant application functionality. Processes, in other words, are an architectural construct.

23

By contrast, a thread is a coding construct that doesn't affect the architecture of an application. A single process might contains multiple threads; all threads within a process share the same state and same memory space, and can communicate with each other directly, because they share the same variables.

Threads typically are spawned for a short-term benefit that is usually visualized as a serial task, but which doesn't have to be performed in a linear manner (such as performing a complex mathematical computation using parallelism, or initializing a large matrix), and then are absorbed when no longer required. The scope of a thread is within a specific code module—which is why we can bolt-on threading without affecting the broader application.

In computing, Inter-process communication (IPC) is a set of techniques for the exchange of data among multiple threads in one or more processes. Processes may be running on one or more computers connected by a network. IPC techniques are divided into methods for message passing, synchronization, shared memory, and remote procedure calls (RPC). The method of IPC used may vary based on the bandwidth and latency of communication between the threads, and the type of data being communicated.

There are several reasons for providing an environment that allows process cooperation:

Information sharing speedup; Modularity; Convenience; and Privilege separation .

MEMORY MANAGEMENT

The memory management subsystem is one of the most important parts of the operating system. Since the early days of computing, there has been a need for more memory than exists physically in a system. Strategies have been developed to overcome this limitation and the most successful of these is virtual memory. Virtual memory makes the system appear to have more memory than it actually has by sharing it between competing processes as they need it.

Virtual memory does more than just make your computer's memory go further. The memory management subsystem provides:

Large Address SpacesThe operating system makes the system appear as if it has a larger amount of memory than it actually has. The virtual memory can be many times larger than the physical memory in the system,

ProtectionEach process in the system has its own virtual address space. These virtual address spaces are completely separate from each other and so a process running one application cannot affect another. Also, the hardware virtual memory mechanisms allow areas of memory to

24

be protected against writing. This protects code and data from being overwritten by rogue applications.

Memory MappingMemory mapping is used to map image and data files into a processes address space. In memory mapping, the contents of a file are linked directly into the virtual address space of a process.

Fair Physical Memory AllocationThe memory management subsystem allows each running process in the system a fair share of the physical memory of the system,

Shared Virtual MemoryAlthough virtual memory allows processes to have separate (virtual) address spaces, there are times when you need processes to share memory. For example there could be several processes in the system running the bash command shell. Rather than have several copies of bash, one in each processes virtual address space, it is better to have only one copy in physical memory and all of the processes running bash share it. Dynamic libraries are another common example of executing code shared between several processes.

Shared memory can also be used as an Inter Process Communication (IPC) mechanism, with two or more processes exchanging information via memory common to all of them. Linux supports the Unix TM System V shared memory IPC.

3.1 An Abstract Model of Virtual Memory

25

3.1.3 Shared Virtual Memory

Virtual memory makes it easy for several processes to share memory. All memory access are made via page tables and each process has its own separate page table. For two processes sharing a physical page of memory, its physical page frame number must appear in a page table entry in both of their page tables.

Figure 3.1 shows two processes that each share physical page frame number 4. For process X this is virtual page frame number 4 whereas for process Y this is virtual page frame number 6. This illustrates an interesting point about sharing pages: the shared physical page does not have to exist at the same place in virtual memory for any or all of the processes sharing it.

If you were to implement a system using the above theoretical model then it would work, but not particularly efficiently. Both operating system and processor designers try hard to extract more performance from the system. Apart from making the processors, memory and so on faster the best approach is to maintain caches of useful information and data that make some operations faster. Linux uses a number of memory management related caches: Buffer Cache

The buffer cache contains data buffers that are used by the block device drivers.

These buffers are of fixed sizes (for example 512 bytes) and contain blocks of information that have either been read from a block device or are being written to it. A block device is one that can only be accessed by reading and writing fixed sized blocks of data. All hard disks are block devices.

The buffer cache is indexed via the device identifier and the desired block number and is used to quickly find a block of data. Block devices are only ever accessed via the buffer cache. If data can be found in the buffer cache then it does not need to be read from the physical block device, for example a hard disk, and access to it is much faster.

A file system (often also written as filesystem) is a method of storing and organizing computer files and their data. Essentially, it organizes these files into a database for the storage, organization, manipulation, and retrieval by the computer's operating system.

File systems are used on data storage devices such as hard disks or CD-ROMs to maintain the physical location of the files. Beyond this, they might provide access to data on a file server by acting as clients for a network protocol (e.g., NFS, SMB, or 9P clients), or they may be virtual and exist only as an access method for virtual data (e.g., procfs). It is distinguished from a directory service and registry.

Theory of Computation-Regular languageFrom Wikipedia, the free encyclopedia

Jump to: navigation, search

26

http://en.wikipedia.org/wiki/Regular_language#p-search

http://en.wikipedia.org/wiki/Regular_language#mw-head

http://en.wikipedia.org/wiki/Windows_registry

http://en.wikipedia.org/wiki/Directory_service

http://en.wikipedia.org/wiki/Procfs

http://en.wikipedia.org/wiki/9P

http://en.wikipedia.org/wiki/Server_Message_Block

http://en.wikipedia.org/wiki/Network_File_System_(protocol)

http://en.wikipedia.org/wiki/Network_protocol

http://en.wikipedia.org/wiki/CD-ROM

http://en.wikipedia.org/wiki/Hard_disk

http://en.wikipedia.org/wiki/Data_storage_device

http://en.wikipedia.org/wiki/Operating_system

http://en.wikipedia.org/wiki/Database

http://en.wikipedia.org/wiki/Data

http://en.wikipedia.org/wiki/Computer_file

http://en.wikipedia.org/wiki/Computer_file

http://en.wikipedia.org/wiki/Organize

http://tldp.org/LDP/tlk/mm/memory.html#abstract-mm-model

For natural language that is regulated, see List of language regulators.

In theoretical computer science, a regular language is a formal language (i.e., a possibly infinite set of finite sequences of symbols from a finite alphabet) that satisfies the following equivalent properties:

it can be accepted by a deterministic finite state machine. it can be accepted by a nondeterministic finite state machine it can be accepted by an alternating finite automaton it can be described by a formal regular expression. Note that the "regular expression" features

provided with many programming languages are augmented with features that make them capable of recognizing languages which are not regular, and are therefore not strictly equivalent to formal regular expressions.

it can be generated by a regular grammar it can be generated by a prefix grammar it can be accepted by a read-only Turing machine it can be defined in monadic second-order logic it is recognized by some finitely generated monoid it is the preimage of a subset of a finite monoid under a homomorphism from the free monoid

on its alphabet

an External Memory Interface is a bus protocol for communication from an integrated circuit, such as a microprocessor, to an external memory device located on a circuit board. The memory is referred to as external because it is not contained within the internal circuitry of the integrated circuit and thus is externally located on the circuit board.

Some common External Memory Interfaces include:

DDR DDR2 GDDR COMPUTER ORGANISATION

27

http://en.wikipedia.org/wiki/Preimage

http://en.wikipedia.org/wiki/Monoid

http://en.wikipedia.org/wiki/Second-order_logic

http://en.wikipedia.org/wiki/Monadic_predicate_calculus

http://en.wikipedia.org/wiki/Turing_machine

http://en.wikipedia.org/wiki/Prefix_grammar

http://en.wikipedia.org/wiki/Regular_grammar

http://en.wikipedia.org/wiki/Regular_expression#Formal_language_theory

http://en.wikipedia.org/wiki/Alternating_finite_automaton

http://en.wikipedia.org/wiki/Nondeterministic_finite_state_machine

http://en.wikipedia.org/wiki/Deterministic_finite_state_machine

http://en.wikipedia.org/wiki/Formal_language


http://en.wikipedia.org/wiki/List_of_language_regulators

2.4 FUNCTIONAL UNITS In order to carry out the operations mentioned in the previous section the computer

allocates the task between its various functional units. The computer system is divided into three separate units for its operation. They are 1) arithmetic logical unit, 2) control unit, and 3) central processing unit.

2.4.1 Arithmetic Logical Unit (ALU) After you enter data through the input device it is stored in the primary storage unit. The

actual processing of the data and instruction are performed by Arithmetic Logical Unit. The major operations performed by the ALU are addition, subtraction, multiplication, division, logic and comparison. Data is transferred to ALU from storage unit when required. After processing the output is returned back to storage unit for further processing or getting stored.

2.4.2 Control Unit (CU) The next component of computer is the Control Unit, which acts like the supervisor

seeing that things are done in proper fashion. The control unit determines the sequence in which computer programs and instructions are executed. Things like processing of programs stored in the main memory, interpretation of the instructions and issuing of signals for other units of the computer to execute them. It also acts as a switch board operator when several users access the computer simultaneously. Thereby it coordinates the activities of computer’s peripheral equipment as they perform the input and output. Therefore it is the manager of all operations mentioned in the previous section.

2.4.3 Central Processing Unit (CPU) The ALU and the CU of a computer system are jointly known as the central processing

unit. You may call CPU as the brain of any computer system. It is just like brain that takes all major decisions, makes all sorts of calculations and directs different parts of the computer functions by activating and controlling the operations.

Random Access Memory (RAM): The primary storage is referred to as random access memory (RAM) because it is possible to randomly select and use any location of the memory directly store and retrieve data. It takes same time to any address of the memory as the first address. It is also called read/write memory. The storage of data and instructions inside the primary storage is temporary. It disappears from RAM as soon as the power to the computer is switched off. The memories, which loose their content on

28

failure of power supply, are known as volatile memories .So now we can say that RAM is volatile memory.

Read Only Memory (ROM): There is another memory in computer, which is called Read Only Memory (ROM). Again it is the ICs inside the PC that form the ROM. The storage of program and data in the ROM is permanent. The ROM stores some standard processing programs supplied by the manufacturers to operate the personal computer. The ROM can only be read by the CPU but it cannot be changed. The basic input/output program is stored in the ROM that examines and initializes various equipment attached to the PC when the switch is made ON. The memories, which do not loose their content on failure of power supply, are known as non-volatile memories. ROM is non-volatile memory.

PROM There is another type of primary memory in computer, which is called Programmable Read Only Memory (PROM). You know that it is not possible to modify or erase programs stored in ROM, but it is possible for you to store your program in PROM chip. Once the programmes are written it cannot be changed and remain intact even if power is switched off. Therefore programs or instructions written in PROM or ROM cannot be erased or changed.

EPROM: This stands for Erasable Programmable Read Only Memory, which over come the problem of PROM & ROM. EPROM chip can be programmed time and again by erasing the information stored earlier in it. Information stored in EPROM exposing the chip for some time ultraviolet light and it erases chip is reprogrammed using a special programming facility. When the EPROM is in use information can only be read.

Cache Memory: The speed of CPU is extremely high compared to the access time of main memory. Therefore the performance of CPU decreases due to the slow speed of main memory. To decrease the mismatch in operating speed, a small memory chip is attached between CPU and Main memory whose access time is very close to the processing speed of CPU. It is called CACHE memory. CACHE memories are accessed much faster than conventional RAM. It is used to store programs or data currently being executed or temporary data frequently used by the CPU. So each memory makes main memory to be faster and larger than it really is. It is also very expensive to have bigger size of cache memory and its size is normally kept small.

Registers: The CPU processes data and instructions with high speed, there is also movement of data between various units of computer. It is necessary to transfer the processed data with high speed. So the computer uses a number of special memory units called registers. They are not part of the main memory but they store data or information temporarily and pass it on as directed by the control unit.

SECONDARY STORAGE Magnetic Tape: Magnetic tapes are used for large computers like mainframe computers

where large volume of data is stored for a longer time. In PC also you can use tapes in the form of cassettes. The cost of storing data in tapes is inexpensive. Tapes consist of magnetic materials that store data permanently. It can be 12.5 mm to 25 mm wide plastic film-type and 500 meter to 1200 meter long which is coated with magnetic material. The deck is connected to the central processor and information is fed into or read from the tape through the processor. It similar to cassette tape recorder

29

Magnetic Disk: You might have seen the gramophone record, which is circular like a disk and coated with magnetic material. Magnetic disks used in computer are made on the same principle. It rotates with very high speed inside the computer drive. Data is stored on both the surface of the disk. Magnetic disks are most popular for direct access storage device. Each disk consists of a number of invisible concentric circles called tracks. Information is recorded on tracks of a disk surface in the form of tiny magnetic spots. The presence of a magnetic spot represents one bit and its absence represents zero bit. The information stored in a disk can be read many times without affecting the stored data. So the reading operation is non-destructive. But if you want to write a new data, then the existing data is erased from the disk and new data is recorded.

Floppy Disk: It is similar to magnetic disk discussed above. They are 5.25 inch or 3.5 inch in diameter. They come in single or double density and recorded on one or both surface of the diskette. The capacity of a 5.25-inch floppy is 1.2 mega bytes whereas for 3.5 inch floppy it is 1.44 mega bytes. It is cheaper than any other storage devices and is portable. The floppy is a low cost device particularly suitable for personal computer syste

Optical Disk: ompact Disk/ Read Only Memory (CD-ROM):Write Once, Read Many (WORM):Erasable

Optical Disk:

An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput (the number of instructions that can be executed in a unit of time).

The fundamental idea is to split the processing of a computer instruction into a series of independent steps, with storage at the end of each step. This allows the computer's control circuitry to issue instructions at the processing rate of the slowest step, which is much faster than the time needed to perform all steps at once. The term pipeline refers to the fact that each step is carrying data at once (like water), and each step is connected to the next (like the links of a pipe.)

The origin of pipelining is thought to be either the ILLIAC II project or the IBM Stretch project though a simple version was used earlier in the Z1 in 1939 and the Z3 in 1941. .[1]

The IBM Stretch Project proposed the terms, "Fetch, Decode, and Execute" that became common usage.

Most modern CPUs are driven by a clock. The CPU consists internally of logic and memory (flip flops). When the clock signal arrives, the flip flops take their new value and the logic then requires a period of time to decode the new values. Then the next clock pulse arrives and the flip flops again take their new values, and so on. By breaking the logic into smaller pieces and inserting flip flops between the pieces of logic, the delay before the logic gives valid outputs is reduced. In this way the clock period can be reduced. For example, the classic RISC pipeline is broken into five stages with a set of flip flops between each stage.

1. Instruction fetch2. Instruction decode and register fetch3. Execute

30

4. Memory access5. Register write back

Pipelining does not help in all cases. There are several possible disadvantages. An instruction pipeline is said to be fully pipelined if it can accept a new instruction every clock cycle. A pipeline that is not fully pipelined has wait cycles that delay the progress of the pipeline.

Advantages of Pipelining:

1. The cycle time of the processor is reduced, thus increasing instruction issue-rate in most cases.

2. Some combinational circuits such as adders or multipliers can be made faster by adding more circuitry. If pipelining is used instead, it can save circuitry vs. a more complex combinational circuit.

Disadvantages of Pipelining:

1. A non-pipelined processor executes only a single instruction at a time. This prevents branch delays (in effect, every branch is delayed) and problems with serial instructions being executed concurrently. Consequently the design is simpler and cheaper to manufacture.

2. The instruction latency in a non-pipelined processor is slightly lower than in a pipelined equivalent. This is because extra flip flops must be added to the data path of a pipelined processor.

3. A non-pipelined processor will have a stable instruction bandwidth. The performance of a pipelined processor is much harder to predict and may vary more widely between different programs.

Point-to-point Communication

The first computer communication systems had each communication channel, e.g. a leased circuit, connecting exactly two computers. This is known as point-to-point communication and has three useful properties.

Each connection is independent of the others and can use appropriate hardware. The two end points have exclusive access and can decide how to send data across the

connection. Since only two computers have access to the channel, it is easy to enforce security and

privacy.

However, point-to-point communications also have disadvantages. The main disadvantage is the proliferation of connections, as illustrated

LAN Topologies

31

In the late 1960s and the early 1970s researchers developed a form of computer communication known as Local Area Networks (LANs). These are different from long-distance communications because they rely on sharing the network. Each LAN consists of a single shared medium, usually a cable, to which many computers are attached. The computers co-ordinate and take turns using the medium to send packets.

Unfortunately, this mechanism does not scale. Co-ordination requires communication, and the time to communicate depends on distance - large geographic separation between computers introduces longer delays. Therefore, shared networks with long delays are inefficient. In addition, providing high bandwidth communication channels over long distances is very expensive.

There are a number of different LAN technologies. Each technology is classified into a category according to its topology, or general shape. The first of these is a star topology, as illustrated in Figure 3.

Figure 3: Star topology

The hub accepts data from a sender and delivers it to the receiver. In practice, a star network seldom has a symmetric shape; the hub often resides in a separate location from the computers attached to it.

A network using a ring topology arranges the computers in a circle - the first computer is cabled to the second. Another cable connects the second computer to the third, and so on, until a cable connects the final computer back to the first. This is illustrated in Figure 4.

Figure 4: Ring topology

32

Once again, the ring, like the star topology, refers to logical connections, not physical orientation.

A network that uses a bus topology consists of a number of computers all connected to a single, long cable. Any computer attached to the bus can send a signal down the cable, and all computers receive the signal. This is illustrated in Figure 5.

Figure 5: Bus topology

The computers attached to a bus network must co-ordinate to ensure that only one computer sends a signal at any time. In addition, the ends of a bus network must be terminated to prevent electrical signals from reflecting back along the bus.

Ethernet

Ethernet is a widely used technology employing a bus topology. The original standard was published by Digital Equipment Corporation, Intel Corporation, and Xerox Corporation in 1982. IEEE currently controls Ethernet standards, e.g. IEEE 802.3 was published in 1985.

In its original form, an Ethernet LAN consists of a single coaxial cable called the ether, but often referred to as a segment. A segment is limited to 500 m in length, with a minimum separation of 3 m between each pair of connections. It operates at 10 Mbps; a later version, Fast Ethernet, operates at 100 Mbps; the latest version, Gigabit Ethernet, operates at 1,000 Mbps or 1 Gbps.

Manchester Encoding

The standard specifies that Ethernet frames are transmitted using Manchester Encoding, which uses the fact that hardware can detect a change in voltage more easily than a fixed value, e.g. RS-232. Technically, the hardware is edge triggered, with the changes known as rising or falling edges. The sender transmits a falling edge to encode a 0 and a rising edge to encode a 1, as illustrated in Figure 6.

33

Figure 6: Manchester encoding

The voltage change that encodes a bit occurs exactly half-way through the time slot. Exactly half-way through the first time slot, the voltage becomes positive (+0.85 v) to encode a 1. Similarly, exactly half-way through the second time slot, the voltage becomes negative (-0.85 v) to encode a 0. If two contiguous bits have the same value, an additional change in voltage occurs at the edge of the time slot.

Manchester Encoding uses a preamble to allow for synchronisation. The preamble consists of 64 alternating 1s and 0s sent before the frame. These produce a square wave with transitions exactly in the middle of each slot. Receiving hardware uses the preamble to synchronise with the time slots. The last two bits of the preamble are both 1s to signal the end of the preamble.

Nodes, hubs and switches

A network is a collection of computers or other devices, commonly called nodes, that are able to communicate with each other. This communication takes place on different network levels. A network may use the Internet Protocol (IP) at one level and Ethernet at the level directly below it. This distinction is important because some parts of the network operate at the IP level and others at the Ethernet level.

The most common type of network (especially in the home) is the Ethernet network shown in figure 1, where all nodes are connected to a central device. In its simplest form this central node is called a hub.

Figure 1: a basic network architecture

34

http://www.iusmentis.com/technology/tcpip/architecture/

Basically, a hub is a box with lots of connections (sockets) for Ethernet cables. The hub repeats all messages it receives to all connected nodes, and these nodes filter out only the messages that are intended for them. This filtering takes place at the Ethernet level: incoming messages carry the Ethernet network address of the intended recipient.

A problem with this approach is that hubs generate a lot of traffic, especially on larger networks. Most of this traffic is wasted, since it is intended for only one node but it is sent to all nodes on the network.

Figure 2: a basic network with a hub and a switch

A commonly used solution today is a switch. A switch still connects all nodes to each other, like a hub, but is more intelligent in which messages are passed on to which node. A switch examines incoming Ethernet messages to see which node is the intended recipient, and then directly (and only) passes the messages to that node. This way other nodes do not unnecessarily receive all traffic.

Since switches are more expensive than hubs, a low-traffic part of the network could be set up using a hub, with the more high-traffic nodes being interconnected to the switch. The hub segment is then connected to the switch as well, as shown in figure 2.

Segments and bridges

A large network can be divided into multiple parts which are called segments. Each segment can use its own network protocol, security rules, firewalls and so on. Nodes on different segments cannot directly communicate with each other. To make this possible, a bridge is added between the segments, as shown in figure 3.

35

Figure 3: two network segments connected via a bridge

The bridge lets packet pass that are destined for a host on the other side. This seems to turn the two segments into one big network again, but there is an important difference. Data packets generated on one segment and intended for that same segment are not passed to the other segment. This saves on data transmission on the network as a whole.

Routers and routing

Figure 4: two networks connected via a router

The above examples all presented a single network at the Internet Protocol level. Even when the network is segmented, all nodes are still able to communicate with each other. To connect networks, a router or gateway is used.

Routers and gateways

A router is connected to two different networks and passes packets between them, as shown in figure 4 to the right. In a typical home network, the router provides the connection between the network and the Internet.

36

A gateway is the same as a router, except in that it also translates between one network system or protocol and another. The NAT protocol for example uses a NAT gateway to connect a private network to the Internet.

TCP and UDP Socket Interface

TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) are both transport layer protocols. TCP is used when a reliable, stream-oriented, transport is required for data flowing between two hosts on a network. UDP is a record-oriented protocol which is used when lower overhead is more important than reliability. The acronym UDP is sometimes expanded as "unreliable datagram protocol" although, in practice, UDP is quite reliable especially over a local Ethernet LAN segment.

The Dynamic C TCP/IP libraries implement TCP and UDP over IP (Internet Protocol). IP is a network layer protocol, that in turn uses lower levels known as "link layer" protocols, such as Ethernet and PPP (Point-to-Point Protocol). The link-layer protocols depend on a physical layer, such as 10BaseT for Ethernet, or asynchronous RS232 for PPP over serial.

In the other direction, various protocols use TCP. This includes the familiar protocols HTTP, SMTP (mail) and FTP. Other protocols use UDP: DNS and SNMP to name a couple. TCP handles a lot of messy details which are necessary to ensure reliable data flow in spite of possible deficiencies in the network, such as lost or re-ordered packets. For example, TCP will automatically retransmit data that was not acknowledged by the peer within a reasonable time. TCP also paces data transmission so that it does not overflow the peer's receive buffers (which are always finite) and does not overload intermediate nodes (routers) in the network. UDP leaves all of these details to the application, however UDP has some benefits that TCP cannot provide: one benefit is that UDP can "broadcast" to more than one peer, and another is that UDP preserves the concept of "record boundaries" which can be useful for some applications.

TCP is a connection-oriented protocol. Two peers establish a TCP connection, which persists for the exclusive use of the two parties until it is mutually closed (in the usual case). UDP is connectionless. There is no special start-up or tear-down required for UDP communications. You can send a UDP packet at any time to any destination. Of course, the destination may not be ready to receive UDP packets, so the application has to handle this possibility. (In spite of being "connectionless," we still sometimes refer to UDP "connections" or "sessions" with the understanding that the connection is a figment of your application's imagination.)

This chapter describes how to implement your own application level protocols on top of TCP or UDP. The Dynamic C TCP/IP libraries can also be examined for further hints as to how to code your application. For example, HTTP.LIB contains the source for an HTTP web server.

37

Network Layers

The layered concept of networking was developed to accommodate changes in technology. Each layer of a specific network model may be responsible for a different function of the network. Each layer will pass information up and down to the next subsequent layer as data is processed.

The OSI Network Model Standard

The OSI network model layers are arranged here from the lower levels starting with the physical (hardware) to the higher levels.

1. Physical Layer - The actual hardware. 2. Data Link Layer - Data transfer method (802x ethernet). Puts data in frames and ensures error

free transmission. Also controls the timing of the network transmission. Adds frame type, address, and error control information. IEEE divided this layer into the two following sublayers.

1. Logical Link control (LLC) - Maintains the Link between two computers by establishing Service Access Points (SAPs) which are a series of interface points. IEEE 802.2.

2. Media Access Control (MAC) - Used to coordinate the sending of data between computers. The 802.3, 4, 5, and 12 standards apply to this layer. If you hear someone talking about the MAC address of a network card, they are referring to the hardware address of the card.

3. Network Layer - IP network protocol. Routes messages using the best path available. 4. Transport Layer - TCP, UDP. Ensures properly sequenced and error free transmission. 5. Session Layer - The user's interface to the network. Determines when the session is begun or

opened, how long it is used, and when it is closed. Controls the transmission of data during the session. Supports security and name lookup enabling computers to locate each other.

6. Presentation Layer - ASCII or EBCDEC data syntax. Makes the type of data transparent to the layers around it. Used to translate date to computer specific format such as byte ordering. It may include compression. It prepares the data, either for the network or the application depending on the direction it is going.

7. Application Layer - Provides services software applications need. Provides the ability for user applications to interact with the network.

Many protocol stacks overlap the borders of the seven layer model by operating at multiple layers of the model. File Transport Protocol (FTP) and telnet both work at the application, presentation, and the session layers.

38

APPLICATION LAYER PROTOCOLS

Simple Mail Transfer Protocol (SMTP) is an Internet standard for electronic mail (e-mail) transmission across Internet Protocol (IP) networks. SMTP was first defined by RFC 821 (STD 10) (1982),[1] and last updated by RFC 5321 (2008)[2] which includes the extended SMTP (ESMTP) additions, and is the protocol in widespread use today. SMTP is specified for outgoing mail transport and uses TCP port 25. The protocol for new submissions is effectively the same as SMTP, but it uses port 587 instead.

While electronic mail servers and other mail transfer agents use SMTP to send and receive mail messages, user-level client mail applications typically only use SMTP for sending messages to a mail server for relaying. For receiving messages, client applications usually use either the Post Office Protocol (POP) or the Internet Message Access Protocol (IMAP) or a proprietary system (such as Microsoft Exchange or Lotus Notes/Domino) to access their mail box accounts on a mail server.

39

http://en.wikipedia.org/wiki/Internet_Message_Access_Protocol

http://en.wikipedia.org/wiki/Post_Office_Protocol

http://en.wikipedia.org/wiki/Post_Office_Protocol

http://en.wikipedia.org/wiki/Mail_relay

http://en.wikipedia.org/wiki/Mail_transfer_agent

http://en.wikipedia.org/wiki/Mail_server

http://en.wikipedia.org/wiki/Computer_port_(software)

http://en.wikipedia.org/wiki/Transmission_Control_Protocol

http://en.wikipedia.org/wiki/Extended_SMTP

http://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol#cite_note-rfc5321-1

http://tools.ietf.org/html/rfc5321

http://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol#cite_note-rfc821-0

http://en.wikipedia.org/wiki/Internet_standard


http://en.wikipedia.org/wiki/Internet_Protocol

http://en.wikipedia.org/wiki/E-mail

http://en.wikipedia.org/wiki/Internet_standard

File Transfer Protocol (FTP) is a standard network protocol used to copy a file from one host to another over a TCP-based network, such as the Internet. FTP is built on a client-server architecture and utilizes separate control and data connections between the client and server.[1] FTP users may authenticate themselves using a clear-text sign-in protocol but can connect anonymously if the server is configured to allow it.

The first FTP client applications were interactive command-line tools, implementing standard commands and syntax. Graphical user interface clients have since been developed for many of the popular desktop operating systems in use today.

The Hypertext Transfer Protocol (HTTP) is a networking protocol for distributed, collaborative, hypermedia information systems.[1] HTTP is the foundation of data communication for the World Wide Web.

The standards development of HTTP has been coordinated by the Internet Engineering Task Force (IETF) and the World Wide Web Consortium, culminating in the publication of a series of Requests for Comments (RFCs), most notably RFC 2616 (June 1999), which defines HTTP/1.1, the version of HTTP in common use.

40


http://en.wikipedia.org/wiki/Requests_for_Comments

http://en.wikipedia.org/wiki/World_Wide_Web_Consortium

http://en.wikipedia.org/wiki/Internet_Engineering_Task_Force

http://en.wikipedia.org/wiki/Internet_Engineering_Task_Force

http://en.wikipedia.org/wiki/World_Wide_Web

http://en.wikipedia.org/wiki/HyperText_Transfer_Protocol#cite_note-ietf2616-0

http://en.wikipedia.org/wiki/Communications_protocol

http://en.wikipedia.org/wiki/GUI

http://en.wikipedia.org/wiki/File_Transfer_Protocol#cite_note-for-0

http://en.wikipedia.org/wiki/Client-server_model

http://en.wikipedia.org/wiki/Internet

http://en.wikipedia.org/wiki/Transmission_Control_Protocol

http://en.wikipedia.org/wiki/Network_protocol

Documents

Computer Science Interview questions