25
Data Structures Introduction

Data Structure Terminology 2

Embed Size (px)

DESCRIPTION

Data Structure Terminology and difinitions

Citation preview

  • Data Structures Introduction

  • BASIC TERMINOLOGY Definition: A data structure is a specialized format for organizing and storing data. General data structure types include the array, the file, the record, the table, the tree, and so on. Any data structure is designed to organize data to suit a specific purpose so that it can be accessed and worked with in appropriate ways. 1.1 Elementary Data Organization

    1.1.1 Data and Data Item

    Data are simply collection of facts and figures. Data are values or set of values. A data item refers to a single unit of values. Data items that are divided into sub items are group items; those that are not are called elementary items. For example, a students name may be divided into three sub items [first name, middle name and last name] but the ID of a student would normally be treated as a single item.

    In the above example ( ID, Age, Gender, First, Middle, Last, Street, Area ) are elementary data items, whereas (Name, Address ) are group data items.

  • 1.1.2 Data Type Data type is a classification identifying one of various types of data, such as

    floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; and the way values of that type can be stored. It is of two types: Primitive and non-primitive data type.

    Primitive data type is the basic data type that is provided by the programming

    language with built-in support. This data type is native to the language and is supported by machine directly while non-primitive data type is derived from primitive data type. For example- array, structure etc. 1.1.3 Variable

    It is a symbolic name given to some known or unknown quantity or information, for the purpose of allowing the name to be used independently of the information it represents. A variable name in computer source code is usually associated with a data storage location and thus also its contents and these may change during the course of program execution. 1.1.4 Record

    Collection of related data items is known as record. The elements of records are usually called fields or members. Records are distinguished from arrays by the fact that their number of fields is typically fixed, each field has a name, and that each field may have a different type. 1.1.5 Program

    A sequence of instructions that a computer can interpret and execute is termed as program. 1.1.6 Entity

    An entity is something that has certain attributes or properties which may be assigned some values. The values themselves may be either numeric or non-numeric.

  • Example:

    1.1.7 Entity Set

    An entity set is a group of or set of similar entities. For example, employees of an organization, students of a class etc. Each attribute of an entity set has a range of values, the set of all possible values that could be assigned to the particular attribute. The term information is sometimes used for data with given attributes, of, in other words meaningful or processed data. 1.1.8 Field

    A field is a single elementary unit of information representing an attribute of an entity, a record is the collection of field values of a given entity and a file is the collection of records of the entities in a given entity set. 1.1.9 File

    File is a collection of records of the entities in a given entity set. For example, file containing records of students of a particular class. 1.1.10 Key

    A key is one or more field(s) in a record that take(s) unique values and can be used to distinguish one record from the others.

    jrfTypewritten text

    Need of data structure

    jrfTypewritten text

    jrfTypewritten textNeed of data structure

    jrfTypewritten text It give different level of organization data.

    It tells how data can be stored and accessed in its elementary level.

    Provide operation on group of data, such as adding an item, looking up highest priority item.

    Provide a means to manage huge amount of data efficiently.

    Provide fast searching and sorting of data.

    jrfTypewritten text

  • Digital Data

    gatcttttta tttaaacgat ctctttatta gatctcttat taggatcatg atcctctgtggataagtgat tattcacatg gcagatcata taattaagga ggatcgtttg ttgtgagtgaccggtgatcg tattgcgtat aagctgggat ctaaatggca tgttatgcac agtcactcggcagaatcaag gttgttatgt ggatatctac tggttttacc ctgcttttaa gcatagttatacacattcgt tcgcgcgatc tttgagctaa ttagagtaaa ttaatccaat ctttgaccca

    Movies

    Music Photos

    Maps

    ProteinShapes

    DNA

    0010101001010101010100100100101010000010010010100....

  • RAM = Symbols + Pointers

    01001010011110110110111010100000001000010010001110000100010100010000000000000100100101011000101010000001110000011111111111111111

    16-bit words02468

    101214

    ASCII table: agreement for the meaning of bits

    We may agree to interpret bits as an address (pointer)

    Physically, RAM is a random accessible array of bits.

    => We can store and manipulate arbitrary symbols (like letters) and associations between them.

    (for our purposes)

  • Digital Data Must Be ...

    Encoded (e.g. 01001001 )

    Arranged- Stored in an orderly way in memory / disk

    Accessed- Insert new data- Remove old data- Find data matching some condition

    Processed Algorithms: shortest path, minimum cut, FFT, ...

    }The focus of this class

  • Data Structures -> Data StructurINGHow do we organize information so that we can find, update, add, and delete portions of it efficiently?

  • Data Structure Example Applications

    1. How does Google quickly find web pages that contain a search term?

    2. Whats the fastest way to broadcast a message to a network of computers?

    3. How can a subsequence of DNA be quickly found within the genome?

    4. How does your operating system track which memory (disk or RAM) is free?

    5. In the game Half-Life, how can the computer determine which parts of the scene are visible?

  • What is a Data Structure Anyway?

    Its an agreement about: how to store a collection of objects in memory, what operations we can perform on that data, the algorithms for those operations, and how time and space efficient those algorithms are.

    Ex. vector in C++: Stores objects sequentially in memory Can access, change, insert or delete objects Algorithms for insert & delete will shift items as needed Space: O(n), Access/change = O(1), Insert/delete = O(n)

  • Abstract Data Types (ADT)

    Data storage & operations encapsulated by an ADT. ADT specifies permitted operations as well as time and space

    guarantees. User unconcerned with how its implemented

    (but we are concerned with implementation in this class).

    ADT is a concept or convention: - not something that directly appears in your code- programming language may provide support for communicating ADT to users

    (e.g. classes in Java & C++)

    insert()

    delete()

    find_min()

    find()

    int main() { D = new Dictionary() D.insert(3,10); cout

  • Dictionary ADT Most basic and most useful ADT:

    insert(key, value) delete(key, value) value = find(key)

    Many languages have it built in:

    Insert, delete, find each either O(log n) [C++] or expected constant [perl, python]

    Any guesses how dictionaries are implemented?

    awk: D[AAPL] = 130 # associative arrayperl: my %D; $D[AAPL] = 130; # hashpython: D = {}; D[AAPL] = 130 # dictionaryC++: map D = new map();

    D[AAPL] = 130; // map

  • C++ STL

    Data structures = containers Interface specifies both operations & time guarantees

    Container Element Access Insert / Delete Iterator Patternsvector const O(n) Random

    list O(n) const Bidirectionalstack const (limited) O(n) Frontqueue const (limited) O(n) Front, Backdeque const O(n), const @ ends Random

    map O(log n) O(log n) Bidirectionalset O(log n) O(log n) Bidirectional

    string const O(n) Bidirectionalarray const O(n) Random

    valarray const O(n) Randombitset const O(n) Random

  • Some STL Operations

    Select operations to be orthogonal: they dont significantly duplicate each others functionality.

    Choose operations to be useful building blocks.

    push_back find insert erase size begin, end (iterators) operator[] front back

    for_each find_if count copy reverse sort set_union min max

    E.g. Data StructureOperations

    E.g. Algorithms

    Itera

    tors,

    Sequ

    ence

    s

  • Suppose Youre Google Maps...You want to store data about cities (location, elevation, population)...

    What kind of operations should your data structure(s) support?

  • Operations to support the following scenarios...

    Finding addresses on map?- Lookup city by name...

    Mobile iPhone user?- Find nearest point to me...

    Car GPS system?- Calculate shortest-path between

    cities...- Show cities within a given

    window...

    Political revolution?- Insert, delete, rename cities

    X

  • Data Organizing Principles

    Ordering: Put keys into some order so that we know something about where each key

    is are relative to the other keys. Phone books are easier to search because they are alphabetized.

    Linking: Add pointers to each record so that we can find related records quickly. E.g. The index in the back of book provides links from words to the pages

    on which they appear.

    Partitioning: Divide the records into 2 or more groups, each group sharing a particular

    property. E.g. Multi-volume encyclopedias (Aa-Be, W-Z) E.g. Folders on your hard drive

  • Ordering

    Pheasant, 10Grouse, 89Quail, 55Pelican, 3Partridge, 32Duck, 18Woodpecker,50Robin, 89Cardinal, 102Eagle, 43Chicken, 7Pigeon, 201Swan, 57Loon, 213Turkey, 99Albatross, 0Ptarmigan, 22Finch, 38Bluejay, 24Heron, 70Egret, 88Goose, 67

    Albatross, 0Bluejay, 24Cardinal, 102Chicken, 7Duck, 18Eagle, 43Egret, 88Finch, 38Goose, 67Grouse, 89Heron, 70Loon, 213Partridge, 32Pelican, 3Pheasant, 10Pigeon, 201Ptarmigan, 22Quail, 55Robin, 89Swan, 57Turkey, 99Woodpecker,50

    Sequential Search O(n)

    (1)

    (2)(3)(4)

    Search for Goose

    Every step discards half the remaining entries:

    n/2k = 1 2k = n

    k = log n

    Binary Search O(log n)

  • Linking

    Records located any where in memory Green pointers give next element Red pointers give previous element Insertion & deletion easy if you have a pointer to the middle of the list

    Dont have to know size of data at start Pointers let us express relationships between pieces of information.

    2 2443

    78

    97

  • Partitioning Ordering implicitly gives a partitioning based on the
  • Wheres the FBI?J. Edgar Hoover Building935 Pennsylvania Avenue, NWWashington, DC, 20535-0001

    NW NE

    SESW

  • Why is the DC partitioning bad? Everything interesting is in the northwest quadrant. Want a balanced partition! Another example: an unbalanced binary search tree:

    (becomes sequential search)

    Much of the first part of this class will betechniques for guaranteeing balance of someform.

    Binary search guarantees balanceby always picking the median.

    When using a linked structure,not as easy to find the median.

    35

    58

    35

    9

    19

    18

    98

  • Implementing Data Structures in C++

    Remember: for templates to work, you should put all the code into the .h file. Templates arent likely to be required for the coding project, but theyre a good

    mechanism for creating reusable data structures.

    template struct Node { Node * next; K key;};

    template class List { public: List(); K front() { if(root) return root->key; throw EmptyException; } // ... protected: Node *root;};

    Structure holds the user data and some data structure bookkeeping info.

    Main data structure class implements the ADT.

    Will work for any type K that supports the required operations (like

  • Any Data Type Can Be Compared:

    By overloading the < operator, we can define an order on any type (e.g. MyType)

    We can sort a vector of MyTypes via:

    Thus, we can assume we can compare any types.

    struct MyType { string name; // ...};

    bool operator

  • So, Much of programming (and thinking about programming) involves deciding

    how to arrange information in memory. [Aka data structures.]

    Choice of data structures can make a big speed difference.- Sequential search vs. Binary Search means O(n) vs. O(log n).- [log (1 billion) < 21].

    Abstract Data Types are a way to encapsulate and hide the implementation of a data structure, while presenting a clean interface to other programmers.

    Data structuring principles:- Ordering- Linking- (Balanced) partitioning

    Review Big-O notation, if youre fuzzy on it.