44
D@TA Normalisation ERD Modelling Data Analysis And more! November 2008 PLUS : Crosswords, Puzzles, Spot The difference And more! NERD

D@TA Normalisation ERD Modelling Data Analysis And more!

Embed Size (px)

Citation preview

  • D@TA

    Normalisation

    ERD Modelling

    Data Analysis

    And more!

    November 2008

    PLUS:

    Crosswords,

    Puzzles, Spot

    The

    difference

    And more!

    NERD

  • Got a funny picture?

    Se

    nd

    us

    a c

    op

    y!

    Th

    e f

    un

    nie

    st

    on

    e e

    ac

    h m

    on

    th w

    ill w

    in a

    pri

    ze

    !

    Cartoon of the month

    WINNER!

  • 1

    Introduction to Normalisation Kevin Mallon - here we are

    introduced to the normal FIRST FOR MANY OF US

    AND NORMAL IS SUBJECTIVE Page 22

    CONTENTS

    Against all odds A Puzzle Padraic Lavin Wow now I am confused.

    Page 33

    Speed Test Patrick Crowe sends

    the theory around the lap a

    couple of times Page 36

    Contents & Desk Top Publishing By Patrick Crowe

    DATA-NERD Issue 1 November 2008

    Introduction to Normalisation here we are

    introduced to the normal A FIRST FOR MANY OF US -

    AND NORMAL IS SUBJECTIVE

    COVER 1,2,3,4 By the Astist Eimear Duffy

    IS NORMAL BETTER? - SAM Senior - Mr SQL Visits Dr Database to

    Find Out if He's Normal and looks at the Yin and Yang of life as a

    database

    Page 25

    Against all odds A Puzzle Padraic Wow now I am confused.

    Page 33

    ERD and Distributed Databases

    Tanya Polianinova whats the point

    in having a good ERD if you dont

    spread it around Page 28

    Speed Test Patrick Crowe sends

    the theory around the lap a

    couple of times Page 36

    Contents & Desk Top Publishing By Patrick Crowe

    November 2008

    COVER 1,2,3,4 By the Astist Eimear Duffy

    SAM Senior Mr SQL Visits Dr Database to

    Find Out if He's Normal and looks at the Yin and Yang of life as a

    ERD and Distributed Databases

    Tanya Polianinova whats the point

    in having a good ERD if you dont

    spread it around Page 28

  • 2

    .

    Mrs. Peacock In The Library ... By Gene Kelly . In this StudyDetective Kelly TERD will lay the cludeo

    plot PAGE 20

    In this article Ian provides a practicle solution to help the

    publishing staff understand what the hell is going with the creative

    department!

    HOW DATA ANALYSIS CAN HELP YOU TO IMPROVE YOUR SEXUAL LIFE Alfredo del Campo our in house Latin Lover solves all your problems ...Read it if you need

    IT! Page7

    Database Design: Only for fellows with Mercs? Gary

    Gallagher from the building site to the gas guzzler or Carroll

    to Chen. Gary stresses that design leads to function. P10

    DATA-NERD Issue 1 November 2008

    Mrs. Peacock In The Library ... By Gene Kelly . In this StudyDetective Kelly TERD will lay the cludeo

    plot PAGE 20 PAGE 3...DATA-NERD A Cradle for our Creativity....... Ian Reston

    In this article Ian provides a practicle solution to help the

    publishing staff understand what the hell is going with the creative

    department!

    HOW DATA ANALYSIS CAN HELP YOU TO IMPROVE YOUR SEXUAL

    Alfredo del Campo our in house Latin Lover solves all your problems ...Read it if you need

    IT! Page7

    Page 15..Lets get Physical ....Denis Farrell examines the logical and the physical, the

    Brain or the Brawn!

    Database Design: Only for fellows with Mercs? Gary

    from the building site to the gas guzzler or Carroll

    to Chen. Gary stresses that design leads to function. P10

    INTRODUCTION TO ERD MODELLING ..Fatih Degirmenci no better place to start than with

    the model man Page 18.

    November 2008

    NERD A Cradle for our Creativity....... Ian Reston

    Page 15..Lets get Physical ....Denis Farrell examines the logical and the physical, the

    Brain or the Brawn!

    INTRODUCTION TO ERD MODELLING ..Fatih Degirmenci no better place to start than with

    the model man Page 18.

  • 3 DATA-NERD Issue 1 November 2008

    A Cradle for our Creativity

    By Ian Retson

    Article

    In this first Issue of our Data-Attack Magazine we thought what better way to relate our

    readers to the subject than to describe in outline our very own in-house bespoke Cradle

    database. This is the key part of our information system that allows us to focus on bringing

    you interesting Creative articles like this one and less time worrying about the mechanics

    required to produce it.

    Genius is one percent inspiration and ninety-nine percent perspiration [1]

    The Cradle is at the core of our steady state organization driving our business in the creation,

    collection and communication of information aimed at you the Database NERD and the

    wannabee NERD community.

    There are separate specialist Publishing and Distribution systems that were purchased as off

    the shelf packages. This allowed us to concentrate on our key information system.

    The hands that rock the cradle [2] or stakeholders were identified initially within the

    Inception Phase; this provides us with a Top Down external view of the system and helps us

    establish boundaries:

    Our NERD customer (YOU) demands informative, varied format and fun articles that also communicate the latest trends within the world of databases.

    A free cut-down on-line version of each Magazine Issue is also made available and is used as a vehicle for registration of extra keen SUPER-NERDS.

    The ACCOUNTANTS (NON-NERD) require that we are cost effective.

    The EDITORIAL (UBER-NERD) staff requires that articles are available for review, to meet editorial and final production deadlines.

    The JOURNALIST (NERD-SYMPATHISER) requires a repository where they can lodge their articles and have access to a library of previous contributions from

    internal and external sources.

    The NERD in turn is encouraged to provide feedback including contributions (NERD- SYMPATHISER-NERD).

    During the Elaboration Phase the following details were established, providing a bottom-up

    view of the system; note the nouns and verbs:

  • 4 DATA-NERD Issue 1 November 2008

    A Magazine is issued on a regular basis made up of Articles approved by the Editor. Sub-

    Editors are responsible for individual departments e.g. News, Puzzles, Feedback, etc. An

    issue maybe categorized as regular or special re-issue or on-line version. An Article is created

    from one or more Items contributed by our in-house and external Agency Journalists.

    An Item is designated a media type which currently distinguishes between photograph,

    illustration and text, but there maybe more in the future.

    At the moment only one magazine is produced but market conditions permitting we hope to

    expand into the OO Modeling world and on to infinity. Our Subscribers are both individuals

    and retail shops. Subscribers are encouraged to contribute articles.

    Note that we didnt leave our data experts perspiring in the basement but we embraced them

    as an integral part of the ongoing analysis & design and so we avoided the mistake where

    The database team often works on its own without open doors of communication.[3]

    The foundation of modern database technology is without question the relational model; it is

    that foundation that makes the field a science. [4]

  • 5 DATA-NERD Issue 1 November 2008

    Design Engineering should always begin with a consideration of data; the foundation for all

    other elements of the design. [5]

  • 6 DATA-NERD Issue 1 November 2008

    Some interesting nerdy points from the Cradle ERD:

    The description of the stakeholders, provide us with insight into the boundaries and scope of the system. The Publishing, Distribution and Accountancy packages are

    outside the scope of the Cradle System; however the entities Article, Subscriber and Staff respectively indicate the genesis of data interfaces between the systems.

    Note the correlation between the nouns in the business description and the entity names in the ERD. The verbs would normally provide us with the associations or

    relationships between the entities but they can be spotted as Foreign Key attributes.

    Can you add the association roles to the ERD?

    The main high volume transactional tables are Item followed by Article, which act as the main system repository; both of which have numeric primary key constituents for

    efficient processing; thereafter the tables are more Master control tables concerned

    with categorizing & grouping the transactions.

    An Article may consist of one or more Items. This promotes parallel activity allowing items to be contributed outside of Issue and Article deadlines; the concept also

    supports the efficient re-use of items across multiple articles over time.

    The cancelled attribute in Article provides us with the capability of stopping an article being added to an Issue after it has been approved by the Editor. This allows us to

    resurrect the article for future issues and avoids a messy deletion option. How would

    a deletion option work? What would be its consequences?

    Editor and Journalist are shown as separate entities since their roles are quite distinct within the system i.e. An Editor controls Articles and Issues whereas Journalists

    contribute items but both are subtypes of the Staff Entity. Note that a Journalist

    maybe external therefore is not a {complete} subtype [This is a discussion for our

    sister Magazine Object-Attack!!!!].

    Further normalization can be achieved as you may have spotted; Address Information is present in the Agency, Staff and Subscriber entities. How would you rationalize

    this into the Diagram? He who asks question is a fool for five minutes; he who does not ask a question is a fool

    forever. [6]

    Answers in next Issue when again more

  • 7

    HOW DATA ANALYSIS

    IMPROVE YOUR SEXUAL LIFE

    Greetings, my dear reader! Now that Ive got

    your attention we can move on to the

    fascinating world of the Data Analysis. Right

    now, you must be wondering: And what on

    earth does Data Analysis have to

    sexual life? - fair enough, keep reading this

    article and you will find out by yourself.

    First of all, lets give a definition of

    Analysis: is a process of gathering,

    modelling, and transforming data

    goal of highlighting useful information

    suggesting conclusions, and supporting

    decision making. Data analysis has multiple

    facets and approaches, encompassing diverse

    techniques under a variety of names, in

    different business, science, and social science

    domains. Wikipedia.

    In this article, we will cover what rol

    Aanalysis plays in the design of a project

    next step will be to talk about how we can

    collect data and various techniques

    Following that point, if you are still with me,

    we will have an overview of both quantitative

    and qualitative data and most important of all

    we will discover the links between your

    life and Data Analysis.

    Project phases design

    Data Analysis is one in multiple steps, but no

    less important, that belong to the complex

    process of Engineering Methodology

    are the different phases we follow that

    comprise the design of a project/product:

    DATA-NERD Issue 1 November 2008

    DATA ANALYSIS CAN HELP YOU TO

    SEXUAL LIFE

    Greetings, my dear reader! Now that Ive got

    your attention we can move on to the

    fascinating world of the Data Analysis. Right

    now, you must be wondering: And what on

    earth does Data Analysis have to do with my

    fair enough, keep reading this

    article and you will find out by yourself.

    First of all, lets give a definition of Data

    is a process of gathering,

    data with the

    information,

    suggesting conclusions, and supporting

    decision making. Data analysis has multiple

    facets and approaches, encompassing diverse

    techniques under a variety of names, in

    different business, science, and social science

    In this article, we will cover what rol Data

    design of a project,

    next step will be to talk about how we can

    collect data and various techniques to do so.

    Following that point, if you are still with me,

    quantitative

    and most important of all

    we will discover the links between your sexual

    Project phases design

    Data Analysis is one in multiple steps, but no

    less important, that belong to the complex

    Engineering Methodology. These

    are the different phases we follow that

    comprise the design of a project/product:

    Analysis, Design, Standards and Support

    Data Analysis is in the first phase, its input

    will be the results of Data Gathering and its

    output will be the input for Conceptual Model

    and Usuability Requirements.

    Having said that, everybody agrees that Data

    Analysis is a useful activity to do but in the

    real world we can find a surprisingly common

    case where collected data is stored but

    analysed.

    Data Gathering/Data

    Collection Techniques

    This is the very initial phase of the design

    Following are the most common techniques

    that are adopted to gather data:

    User Interviews, Contextual Enquiry

    / Scenarios, Direct Interview

    Interviews.

    November 2008

    CAN HELP YOU TO

    Standards and Support.

    Data Analysis is in the first phase, its input

    will be the results of Data Gathering and its

    output will be the input for Conceptual Model

    Having said that, everybody agrees that Data

    Analysis is a useful activity to do but in the

    isingly common

    stored but is never

    Data

    Collection Techniques

    initial phase of the design.

    most common techniques

    Contextual Enquiry, Personas

    Interview, Indirect

  • 8

    Quantitative data analysis

    Here I recommend some useful software

    programs for analysing quantitative data

    Epi-info: Covers most of the statistical analyses.

    Minitab: Covers all the basic statistical analyses.

    SPSS: Statistical package.

    A brief definition of quantitative research

    can be, a measure of how many actors (can be

    humans, or anything that interacts with the

    system under study) act in a particular way.

    The collection of data tends to include large

    amount of information ie, minimum number

    of intervies should be 50. Questionnaires are

    the most common tool used for this purpose,

    with closed questions normally.

    Data quantitative analysis strategy:

    DATA-NERD Issue 1 November 2008

    Quantitative data analysis

    Here I recommend some useful software

    rograms for analysing quantitative data:

    ost of the

    overs all the basic

    tatistical package.

    quantitative research

    actors (can be

    humans, or anything that interacts with the

    act in a particular way.

    ends to include large

    minimum number

    of intervies should be 50. Questionnaires are

    the most common tool used for this purpose,

    analysis strategy:

    For describing the participants, we can use

    the typical descriptive statistics

    Frequency counts, Proportions, Measures of

    central tendency (mean, median, mode

    Measures of dispersion (standard deviation,

    inter-quartile range, etc).

    Talking of relationship or association

    count on Association and Correlation

    If what we are treating is comparative studies

    we have several techniques to work with, ie:

    Student's t-test statistic, Mann-Whitney

    test, paired t-test, Analysis of Variance

    Analysing qualitative research

    Some useful software that can help:

    NVIVO: Accumulates datacodes to data and analyses this

    encoded-data numerically.

    Ethnograph: A similar program to NVIVO.

    November 2008

    , we can use

    escriptive statistics, such

    , Measures of

    mean, median, mode), and

    standard deviation,

    Talking of relationship or association, we can

    Correlation.

    comparative studies,

    we have several techniques to work with, ie:

    Whitney U

    Analysis of Variance,

    Analysing qualitative research

    data, assigns

    s to data and analyses this

    A similar program to

  • 9

    So, what is qualitative research

    research, is used to help the observant to

    understand the motives of the people,

    feel and why. For this purpose, the researcher

    asks questions such as why do you..?

    detailed information. Compared to quantitative

    methods, where accumulated data is much

    larger, samples tend to be smaller.

    have the most common methods of Data

    Analysys in Qualitative Research, compiled by

    Donald Ratcliff:

    Typology, Taxonomy, Constant

    Comparison Grounded Theory, Analytic

    Induction, Logical Analysis/Matrix

    Analysis, Quasi-statistics, Event Analysis

    Microanalysis, Metaphorical Analysis

    Domain Analysis, Hermeneutical

    Discourse analysis, Semiotics, Content

    Analysis, Henomenology Heuristic Analysis

    Narrative analysis.

    DATA PRESENTATION

    Information processed from a sample can be

    presented in many ways. Rather than

    giving plain numbers about the central

    tendency and dispersion, we should look for

    friendlier ways of presenting data

    or charts (ie, frequency, polygon, histogram,

    bar/par); people could see better the

    result of the research.

    DATA-NERD Issue 1 November 2008

    qualitative research? In market

    the observant to

    the motives of the people, how they

    For this purpose, the researcher

    asks questions such as why do you..? to collect

    Compared to quantitative

    methods, where accumulated data is much

    amples tend to be smaller. Here we

    have the most common methods of Data

    Analysys in Qualitative Research, compiled by

    , Constant

    Grounded Theory, Analytic

    Logical Analysis/Matrix

    , Event Analysis

    Metaphorical Analysis,

    Hermeneutical Analysis,

    nalysis, Semiotics, Content

    Heuristic Analysis,

    DATA PRESENTATION

    mple can be

    Rather than just

    giving plain numbers about the central

    tendency and dispersion, we should look for

    friendlier ways of presenting data such graphs

    s (ie, frequency, polygon, histogram,

    people could see better the

    Note for the reader: honestly, did you really

    believe that the Analysis of Data could

    improve the sexual life of anybody?.... Got ya!

    November 2008

    honestly, did you really

    believe that the Analysis of Data could

    improve the sexual life of anybody?.... Got ya!

  • 10 DATA-NERD Issue 1 November 2008

    Database Design:

    Only for fellows with Mercs?

    GARY GALLAGHER

    Liam Carroll is currently one of Irelands leading property developers. His company, Zoe Developers,

    have built more apartments in Dublins inner city than all other builders combined. Carroll is heavily

    involved in the high profile Dublin docklands re-development project and is responsible for the

    Cherrywood Innovation and Technology Business Park site in Loughlinstown, which houses corporate

    giants such as Dell and Friends First.

    Carrolls current standing, however, is a far cry from his initial forays into property development.

    Examples of early efforts in 1989 include Fishermans Wharf - a humdrum scheme of townhouses

    and apartment blocks and Portobello Harbour, described as having no design or functional integrity

    (narrow, lego-like constructions with one room on each floor). All of these early developments share

    one startling characteristic Carroll did not employ architects for their design. Architects, Carroll

    claimed, were only interested in designing penthouses for fellows with Mercs. It was only with the

    introduction of Government apartment design guidelines in 1995, coupled with the prospect of more

    complex development schemes, that Carroll finally decided to engage architects to plan and design

    properties correctly. This move paid off, catapulting Carroll from his status as the shoebox apartment

    king to a respected and successful developer responsible for some of the countrys largest residential

    and commercial developments.

    As you worryingly re-examine the title of this magazine, possibly thinking that you may have picked

    up the wrong one in the shop, let me re-assure you that the above anecdote does hold some connection

    to Database Design. It is based on a widely accepted saying among database workers that building a

    database without a design is akin to building a house without an architects blueprint. Before

    elaborating, let us first examine what we are talking about when discussing Database Design.

    Database design is also referred to as database modelling, however it has nothing to do with women,

    catwalks or lingerie (sorry lads). Fear not though, as some similarities do exist for the more imaginative

    of us. Data modelling is essentially a method of organising data so that it can be used effectively by

    databases. It is concerned with structuring data in a way that it presentable and is placed in nice neat

    packages for processing by the database. It is the first, and some would argue most important step, in

    creating a database.

    Webopedia defines data modelling as the analysis of data entities and their relationships to other data

    entities. An entity, in this case, is any object about which we wish to store information in the database.

  • 11 DATA-NERD Issue 1 November 2008

    They are items in the real world that are capable of existing independently. To illustrate this in simple

    terms think about a computer vendors database. Here, in simple terms, you would need to store

    information about the vendors clients (cust. ID, name, address, tel. #) and about the products that it

    sells (model number, spec, price, availability). The entities here are therefore client and product.

    Now that you have some idea what it is, you may ask why its important enough for us to waste our

    time and your money publishing a whole magazine about Database Design. Fair question, lets try to

    demonstrate why it is so useful (obviously the sight of the Portobello Harbour shanties isnt enough for

    you) by looking at another example, this time loosely based around the current scramble to become

    GWBs successor at the White House. In the aftermath of such an election, in depth analysis would be

    carried out on various aspects of the election. For example - knowing the number of people that have

    voted for the various different political parties would be invaluable. This could be achieved by

    including a column in the database from the very beginning for which party each person voted for. If,

    however, this column was omitted at the beginning it would be very time consuming collating the

    relevant data to get the same result. It is at the database design stage where the decision to include such

    a column would be made.

    The importance of the design stage is equally apparent with even the most basic of databases. You

    could say that if a Formula 1 racing car doesnt have smooth aerodynamics, it will drag and go slower.

    Equally, if a database doesnt adhere to best practices, it wont perform as efficiently as possible. There

    are several methodologies used for creating the perfect database. In this edition we focus on what are

    widely regarded as the two most effective techniques the usage of Entity Relationship Diagrams

    (ERDs) to assist in matching the business needs of the database to the physical design; and a process of

    safeguarding the database from structural problems known as Database Normalisation.

    An ERD is essentially a graphic representation of the entities, and the relationship between the entities,

    within a database. Although initially introduced in the 1960s by a General Electrics engineer, the

    development of ERDs is credited to the American scientist Professor Peter Chen. Chens original ERD

    paper was selected as one of the 38 most influential papers in Computer Science, resulting in his ERD

    approach being ranked as one of the top methodologies in systems development by several surveys of

    FORTUNE 500 companies. Yes folks, it works. While an ERD is mainly concerned with the

    relationships between the entities of a database, the goal of database normalisation is to reduce the

    amount of space a database consumes by eliminating unnecessary duplication of data, thus increasing

    overall performance. Although often previously overlooked as a complicated process for academic

    geniuses, it is now accepted that a grasp of the principles of normalisation can drastically improve

    database performance.

    These methodologies will be explained in more detail as you read on, where their importance will

    hopefully become even more apparent. Should their relevance escape you however, you may want to

    consider again the following. From the shoebox king, to one of the worlds most influential computer

    scientists, the basic principles used in creating a database remain - effective planning and design are

    essential parts of any project. Without them, the roof might fall in.

  • 12 DATA-NERD Issue 1 November 2008

    Keeping IT Real: by Aine Daly How to use Logical ERD Modelling in Effective Database Design

    The logical data model is primarily focused on the representation of

    REALITYtangible objects, actual characteristics, bona fide

    relationshipsthese

    are the fundamentals of logical modelling. Analytically structured to reflect

    the

    core requirements of a business. The model is independent of technology and

    not created with a physical data store in mind. This will come into play in the

    next phase of design the Physical Model.

    Systems have both -

    &

    The logical model concentrates on the needs of the business, there are no

    details included about the physical hardware and database technology.

    It reveals the business processes and data that exist and reflects the

    relationships between the two. The goals at the Logical ERD model are:

    Technological Components

    Program

    Database Management

    System

    Screen Components

    Technology Independent Components

    Logical Data model

    Business Rules

  • 13 DATA-NERD Issue 1 November 2008

    COMMUNICATION between the Business/organization and the Database

    designer is critical in order to achieve the above objectives. Both may have

    different ideas about what the requirements and structure of the database

    should be and collaboration ensures that the system developed will fit the

    business needs. The Logical ERD can be used as a tool of communication as it

    can be easily explained to non-technical clients.

    Logical Entity Relationship Diagram Models convey a great deal of

    information using a very apt and succinct notation. The components used in

    Logical ERD development are: Entities, Relationships and Attributes. Using

    these components the logical model identifies entities and the correct

    relationships among them. The term unique identifier is used to describe data

    element that differenciates between one entity and another. It replaces the

    term Primary Keys because once again, it is technology independent

    whereas Primary Key represents a unique identification of a row in a table

    that can be used as a foreign key in a related table.

    Normalization is used to remove reduntant data and optimize the overall

    data structure by grouping the data elements correctly, ensuring that entities

    are properly formed and each attribute is assigned to the correct entity. This

    systematic process produces a solid database structure which will allow for

    data to be stored and retrieved in the most efficient manner. If the correct

    data is not captured problems are sure to follow. If the relevent entities or

    relationships are not represented correctly in a data model, then end-user

    queries about these entities and relationships cannot be answered.

    .SO THAT THEY MAY BE UNDERSTOOD

    GRAPHICALLY REPRESENT THESE REQUIREMENTS b

    ..DATA-ENTITIES, RELATIONSHIPS, ATTRIBUTES, CARDINALITY...

    ESTABLISH INFORMATION/BUSINESS REQUIREMENTS... a

  • 14 DATA-NERD Issue 1 November 2008

    Regardless of the application that is used in implementation, if you take the

    time to carefully build a logical model your result will be solid foundation for

    your database. It is this framework which will dictate the relevance, speed

    and efficiency of the final database and an organizations success when using

    it to conduct business. It should also have a positive impact on the cost of the

    system development as it resolves problems at an early stage and does not

    incorporate redundant data. Figuring out these issues at the design and

    database developments phase is significantly cheaper then trying to fix a

    problem in an implemented system. The next step is the Physical model

    summarised below:

    The implementation of the logical

    model in the chosen database

    structure

    The physical diagram is platform-

    specific and more detailed

    mapping of the logical model to the

    physical hardware and database

    technology

    Physical

    Logical

  • 15 DATA-NERD Issue 1 November 2008

    Lets get Physical By Denis Farrell

    To understand Physical ERD Modelling fully, we have to look at the complete ERD Modelling

    Picture.

    In the design phase of databases, data is represented using a certain data model. These data

    models are a gathering of concepts or notations for describing data, data relationships and

    data constraints. Data models are either:

    1. Conceptual models

    Collection of entities.

    Flexible data structuring capabilities.

    Examples of this model is object-orientated model, semantic data model and entity-relationship model.

    2. Record based logical models

    Data is considered as a collection of fixed size record.

    These models are closer to the physical level or file structure so they are easier to implement.

    The three most well known models of this kind are relational data model , network data model or hierachical data model.

    3. Physical models

    Provide concepts that describe the details of how data is stored in the computers memory

    It is important to understand how logical and physical models relate to each other and the

    differences between them.

    Logical

    The first stage is to gather all the business requirements for the planned database and convert these requirements into a model. The logical model does not look at the needs of the database but the business requirements are used to determine the needs of the database.

    After all the business requirements and information is collected, reports and diagrams are produced together with entity relationship diagrams, business process diagrams, and eventually process flow diagrams. The diagrams created should demonstrate the processes and data that exists. It should also demonstrate the relationship between the data and the business processes.

  • 16 DATA-NERD Issue 1 November 2008

    Logical modelling should clearly depict a visual illustration of the activities and data relevant to a particular business. Logical modelling has implications on the direction of the design of the database, however it also indirectly affects the performance and administration of an implemented database. If time is taken to perform logical modelling, more opportunities arise for planning the design of the physical database.

    Logical modelling produces diagrams and documentation which determines whether or not the business requirements have been completely gathered. This information is the then reviewed by developers, management and end users to decide if more research and work is required before the commencement of the physical modelling.

    From Logical Modelling we expect to get the following deliverables.

    Entity relationship diagrams

    This give the development team the initial picture what the database needs to deliver.

    It will show the different categories of data for the business and how they relate to

    each other.

    Business process diagrams

    The process model illustrates all the parent and child processes that are performed

    by individuals within a company. This shows the development team how data moves

    within the business

    User feedback documentation

    Physical Modelling

    Physical modelling relates to the actual design of a database. It is cost effective and a practical tool for problem solving and design optimisation. The requirements that were recognised in the logical model set out the basis for the design of the database. The physical model deals with the converting the requirements gathered in the logical model into a relational database model.

    Throughout physical modelling objects such as tables and columns are created. This is based on the entities and attributes defined in the logical model. Also at this stage constraints are defined, including the primary keys, foreign keys and other unique keys. From database tables views can be created to summarise data. All the pieces are brought together in the physical model and this defines the database for the business.

    One restriction of physical modelling is that it is software specific. This means that the objects defined in the physical model can vary on the relational database software been used. Variations exist in the way the data types are represented and stored. Conceptually, basic types of data are the same with different implementations. Databases systems differ in the objects that are

  • 17 DATA-NERD Issue 1 November 2008

    available in one may not be available in another and as a result of this, physical models hardware and software dependent. Oracle is an example of software that will work with many operating system such as Windows NT and UNIX. Java-based products can be used on virtually all operating platforms and hence its popularity. So when choosing database software, hardware and operating system platforms, these need to be looked at in conjunction with one another.

    From physical modelling we expect to get the following deliverables.

    Server model diagrams

    This diagram demonstrates relationships within a database, shows tables and

    columns.

    User feedback documentation

    Database design documentation

  • 18 DATA-NERD Issue 1 November 2008

    student

    INTRODUCTION TO ERD MODELLING

    By Fatih Degirmenci

    One of the most painful problems of database design is different views of designers,

    programmers, and users and this causes design of useless databases or databases which do not

    reflect purposes of actual database. Data Modelling is the first step of Database Design Process

    and it is laid between real world objects and database model. To keep everyone involved and

    aware of design, it is necessary to use a method that simplifies design process. Entity

    Relationship Diagram Modelling is a method that removes potential roadblocks and simplifies

    database design process.

    DATABASE DESIGN AND ERD MODELLING

    Database design is a software engineering activity falls in design activity in generic software

    engineering process.

    Database design process consists of a number of steps including identifying the data to be stored,

    determining relationships between stored data, and structuring data. [1]

    Modelling part is an intermediary step that falls in between requirements gathering and

    construction, and ERD Modelling is widely used modelling schema for this purpose. It allows us

    to abstract notional representation of structured data using conceptual schema to design database

    and it is a general data modelling type for relational databases, which helps design process to be

    simplified. [2]

    Some of the key terms of ERD Modelling are described by Paul Chen as below

    An entity is a thing which can be distinctly identified. A specific person, company, or event

    is an example of an entity. A relationship is an association among entities. [3]

    There are several types of ERD Modelling and widely used type of ERD Modelling is developed

    by Peter Chen. In Chens ERD Modelling, entities are represented by rectangles and entity name

    is in these rectangles expressed in singular form. [4]

    Entity attributes are not shown on ERD itself in original Chen model but it is extended to include

    attributes. Attribute preceded by an asterisk is the identifier of entity. [4]

    student

    *sId name

    address telephone

  • 19 DATA-NERD Issue 1 November 2008

    Relationships show how two or more entities related with each other in forms of verbs, for

    example student submits assignment. In this example, student and assignment are entities and

    submit is the relationship.

    There are several other notations which can be used to draw ERDs and one of the widely used

    notations is Crows foot notation.[1] If we redraw above example with using this notation, we

    have below diagram.

    submits

    Relationships can be in several forms, one-to-one, one-to-many, and many-to-many. In one-to-

    one relationship, one entity is related to only one entity. In previous example, a student related

    with one assignment to show one-to-one relationship. In real world, a student may submit more

    than one assignment and this is a good opportunity to show one-to-many relationship. In this

    case, this relationship can be redrawn as below to include one-to-many relationship.

    submits

    Completed ERD shows the overall plan of database, which is named logical ERD. Database

    designers need to be aware of logical ERD. In DBMS terms, realization is done in physical ERD

    schema.

    In database design, communication with end users is an important step to gather requirements of

    database and have a common view on real world entities. When data modelling starts,

    differences of end users views and developers views are become the main problem which is

    laid upon developers hands and could be solved if developer creates a data model that can be

    understood by end user. ERD Modelling is useful when users need to know more on design and

    developers need to explain design aspects to users. This type of schema gives chance to its users

    and developers to share common view of data and knowledge on how database design issues can

    be handled.

    REFERENCES

    [1] Entity-relationship model - Wikipedia, the free encyclopedia;

    http://en.wikipedia.org/wiki/Entity-relationship_model.

    [2] S. Bagui and R. Earp, Database Design Using Entity-relationship Diagrams,

    Auerbach Publications, 2003.

    [3] P.P.S. Chen, The entity-relationship modeltoward a unified view of data, ACM

    Transactions on Database Systems (TODS), vol. 1, 1976, pp. 9-36.

    [4] J.L. Harrington, Relational Database Design Clearly Explained, Morgan Kaufmann

    Publishers, 2002.

    student assignment submits

    student assignment

    student assignment

  • 20 DATA-NERD Issue 1 November 2008

    Mrs. Peacock in the Library By: Gene Kelly

    Mrs. Peacock

    In The Library

    With The Candle Stick?

    Dr. Black Murdered!

    Dr. John Black (48), self made millionaire, hosted a weekend celebration at his country

    mansion to celebrate the 30th anniversary of his company, DBD inc. Suspicions first arose when Dr. Black was

    nowhere to be seen in the drawing room for pre dinner drinks on Saturday night. By the time deserts were

    being served there was still no sign of Dr. Black and Mrs. White, his maid of 25yrs, now feeling a little worried,

    went to Dr. Blacks room to look for him. Just as she was about to knock on his door, she heard a scream echo

    from what appeared to be the kitchen, this was abruptly followed by another scream coming from the

    entrance hall. Mrs. White went to investigate

    Blacks Tudor Mansion, built in 1586

    When Mrs. White reached the bottom of the stairs she

    was met by Mrs. Peacock who was being comforted by

    Reverend Green. They were both standing beside a pool

    of blood which had been smeared across the carpet. Mrs.

    White felt a strange feeling in her stomach, she wasnt

    sure if it was worry or hope, She continued to the kitchen

    to find the source of the first scream. In the kitchen she

    was met by Miss Scarlet who was standing by the cold

    room with her hand on the door, Mrs. White was closely

    followed by Professor Plum who had also come to find the

    source of the scream, they both looked into the open cold

    room to find the body of Dr. Black.

    Mrs. White ran to the nearest telephone, which was in the Lounge, she called

    the local police station and informed them of the news, they would send

    someone over right away as Mrs. White made her way back to the others,

    she passed through the Billiards Room where she met Colonel Mustard

    sitting in a leather armchair, swirling his snifter of cognac with one hand

    while holding his wooden pipe with the other. Apparently oblivious to the

    happenings in the rest of the house. Mrs. White told Colonel Mustard about

    the body and led him through the conservatory into the ballroom where the

    rest of the guests had gathered. When Mrs. White arrived in the ballroom,

    she noticed that one of the bronze candle sticks that stood by the fireplace

    was missing. Just as she was about to point this out, A Knock!

    Mrs. White went to answer the front door, where she was met by Mr. Parker, the

    local police officer and another man whom she did not recognise. Mr Parker

    introduced the other man as Dr. Peter Chen, who was visiting from Louisiana State

    University to help update the methods used to collect police data. With that Dr. Chen

    proclaimed, don't worry Mrs. White, I'm on the CASE!

  • 21

    DATA-NERD Issue 1 November 2008November 2008

  • 22 DATA-NERD Issue 1 November 2008

    Introduction to Normalisation By Kevin Mallon

    Normalisation is the process of organising data in a database. The goal of data

    normalisation is to reduce and if possible, eliminate data redundancy. This is an

    important consideration for application developers because it is incredibly difficult to

    store objects in a relational database that maintains the same information in several

    places. Redundant data also wastes disk space and creates maintenance problems.

    The main reason for normalizing is the possible corruption of databases due to three

    main factors - insertion anomalies, deletion anomalies and update anomalies.

    Normalisation can also be referred to as canonical synthesis as this is the process of

    designing a database model without redundant data items. Well normalised data

    makes the task of programming a lot easier and works very well in multi-platform,

    enterprise wide environments. Data Normalisation is sometimes known as the cure

    for Spreadsheet Syndrome, the lumping of every possible piece of information into as

    few tables as possible, sometimes into a single table.

    Concepts

    Normalisation

    Spreadsheet Syndrome

    Why

    Normalise? Update Anomalies

    Deletion Anomalies

    Insertion Anomalies

  • 23 DATA-NERD Issue 1 November 2008

    The original concept of database normalisation was introduced by Edgar Frank Codd

    in 1970 in his paper A Relational Model of Data for Large Shared Data Banks. In

    this paper, Codd states there is, in fact, a very simple elimination procedure which

    we shall call normalization. Through decomposition non-simple domains are replaced

    by "domains whose elements are atomic (non-decomposable) values."

    There are a few rules for database normalisation. Each rule is called a "normal form."

    If the first rule is observed, the database is said to be in "first normal form." 1NF is

    often referred to as the atomic rule. In a database, this means that each column should

    only be designed to hold one and only one piece of information. If the first three

    rules are observed, the database is considered to be in "third normal form." Although

    other levels of normalization are possible, third normal form is considered the highest

    level necessary for most applications. The concept of functional dependencies is the

    basis for the first three normal forms. A functional dependency occurs when one

    attribute in a relation uniquely determines another attribute. This can be written A ->

    B which would be the same as stating "B is functionally dependent upon A. The table

    below shows the three most common forms of normalisation.

    Level Rule

    First Normal Form

    (1NF)

    An entity type is in 1NF when it contains no repeating

    groups of data.

    Second Normal Form

    (2NF)

    An entity type is in 2NF when it is in 1NF and when all of

    its non-key attributes are fully dependent on its primary key.

    Third Normal Form

    (3NF)

    An entity type is in 3NF when it is in 2NF and when all of

    its attributes are directly dependent on the primary key

  • 24 DATA-NERD Issue 1 November 2008

    SOLUTION TO: Puzzle Page 34

    1A 7E 9W

    T 3O N E

    T V T B

    2R E L A T I O 4N S H I P S

    I L O T I

    B U Y T

    U 5D I A 6M O N D E

    T I O 11C

    E A 10D A T A B A S E

    G E T

    8V E R B L 12O R D E R

    A G

    13C O M M E R C E O

    R

    Y

  • 25 DATA

    Is Normal Better?

    Mr SQL Visits Dr Database to Find Out if He's

    Normal...

    Mr SQL: Dr Database, I am not sure if I am

    Normal or not. Can you help me?

    Dr Database: Well, Mr SQL, do you feel

    Mr SQL: Not sure what you mean?

    Dr Database: Well, a Normalised database has

    atomic data. Think of an atom. In other words, the data

    can't be broken down any more. For example, fi

    can't be broken down any more.

    Mr SQL: I'm just a raw, Unnormalised

    Dr Database: Do you feel any anomalies

    Mr SQL: Oh, yes, plenty Doc. I have

    inconsistent data and my CPU's very hot and overloaded.

    Also, I feel so bloated and large...must be all the

    redundant data I have.

    Dr Database: Sounds like you have an acute case of

    Spreadsheet Syndrome. Well, I guess you need to be

    Normalised. I will outline the basic plan...

    Three Normal Forms later...

    DATA-NERD Issue 1 November 2008

    By:Sam Senior

    Mr SQL Visits Dr Database to Find Out if He's

    Dr Database, I am not sure if I am

    Well, Mr SQL, do you feel atomic?

    Not sure what you mean?

    database has

    data. Think of an atom. In other words, the data

    can't be broken down any more. For example, first name

    raw, Unnormalised database.

    anomalies?

    Oh, yes, plenty Doc. I have

    and my CPU's very hot and overloaded.

    ..must be all the

    Sounds like you have an acute case of

    Spreadsheet Syndrome. Well, I guess you need to be

    Mr SQL: Wow! I followed the plan of d

    tables into more tables and can feel the

    just slipping away.

    Dr Database: As I predicted, you now have no

    duplicated data due to decreased redundancy

    Mr SQL: My CPU is a lot cooler but when people

    me it takes me longer to respond because of the table

    JOINs.

    Dr Database: Well, we could Denormalise

    Mr SQL: Denormalise? But I spent ages trying to

    Normalise! Why would I want to do that?

    Dr Database: Well, it's not all black and white. Hear

    me out...

    What are the advantages of Normalisation?

    Since there is no duplicity in a Normalised

    will be little or no anomalies. This means little to no

    administration to ensure that the redundant data

    accurate and up-to-date. In addition, little or no

    redundant data means fewer storage requirements

    simplier more efficient structure also means the database

    is more scalable. Also, write actions such as INSERT,

    UPDATE and APPEND, ie: writing to the database, will

    run better.

    November 2008

    By:Sam Senior

    Mr SQL: Wow! I followed the plan of decomposing

    tables into more tables and can feel the redundant data

    As I predicted, you now have no

    decreased redundancy.

    Mr SQL: My CPU is a lot cooler but when people query

    espond because of the table

    Denormalise you a bit.

    ? But I spent ages trying to

    ! Why would I want to do that?

    Well, it's not all black and white. Hear

    vantages of Normalisation?

    Normalised database there

    . This means little to no

    redundant data is

    date. In addition, little or no

    storage requirements. A

    simplier more efficient structure also means the database

    . Also, write actions such as INSERT,

    UPDATE and APPEND, ie: writing to the database, will

  • 26 DATA-NERD Issue 1 November 2008

    However, it's not all good...

    As the table count increases during the Normalisation

    process so to does the JOIN count. If the database is large

    then JOIN jungles can be created which can eventually

    effect response times.

    What can be done to improve performance?

    Improve the Normalisation design so that it reflects the

    data usage; create indexes for frequently queried

    attributes; clustering or just accepting poor performance.

    However, if the users still complainDenormalise!

    Denormalisation is part of the physical design phase and

    can only be done after the data has been Normalised.

    ANOMOLY WARNING: DO NOT DENORMALISE

    UNNORMALISED/RAW DATABASES!

    Question: don't read any further. What do you think

    Denormalisation means and why would a SQL

    administrator do it?

    Denormalisation is the design process of taking

    normalised data and producing a physical design in

    which normalised data is rearranged so that optimal

    access and manipulation of data can be achieved.

    [Inmon]

    Normalised Database Example

    CUSTOMER

    CustomerNum, CustomerName...

    CUST_PHONE

    CustomerNum, Phone

    Denormalised Database Example

    CUSTOMER

    CustomerNum, CustomerName, Phone1, Phone2, Phone3...

    Here are some reasons why a database administrator

    would contemplate using Denormalisation.

    No calculated values. For example, an online shopping cart may have a field

    called total_price, price * quantity,

    which is forbidden by the Third

    Normalised form. Information

    Warehouses use large numbers of pre-

    calculated summary tables known as

    Materialised Views. This improves

    response times for summary data, ie: no

    complex calculations required because a

    pre-calculated result on a summary table

    is queried.

    The key reason: performance. To avoid JOIN jungles. A Normalised

    database must locate the relevant tables

    and then JOIN the data to either get the

    information or process the data. Thus a

    Normalised database uses a higher

    amount of I/O and CPU. In addition,

    Relational DBMSs are optimised to

    perform three-way joins therefore the

    database loses efficiency when more

    complex joins are required. The outcome

    of Denormalisation is better response

    times, ie: reduced I/O and CPU. For

    systems that depend on real-time

    information Denormalisation may be

    required.

    To maintain historical data. For example, a Saleperson's surname may change and

    if the customer name is stored in a

    Normalised database any invoice report

    won't list the old/new surname.

    However, if the surname is stored in a

    separate invoices table as redundant

    data then both surnames will appear in

    the report.

    For specific application requirements. Application coding could be simpler

  • 27 DATA-NERD Issue 1 November 2008

    because the data is spread across fewer

    tables and easier to locate.

    What tools can be used to Denormalise?

    To reduce the number of tables/joins it is

    important to analyse which entities are accessed

    by applications and how they relate to each

    other. This can be achieved by using Entity

    Relationship Diagrams, Data Flow Diagrams

    and Cross-Reference Matrices to identify

    database usage.

    Disadvantages...

    The key risk of Denormalisation is anomalies

    caused by redundant data. Tracking the

    redundant data will require extra

    administrative effort.

    Like everything in life, there's a balance, Ying and

    Yang, et cetera...

    Theres a happy medium between Normalisation and

    Denormalisation but both require a complete

    understanding of the data and the specific business

    requirements.

  • 28 DATA-NERD Issue 1 November 2008

    ERD and Distributed Databases By: Tanya Polianinova

    Distributed databases are widely used by many companies for data storage and

    manipulation. The next few paragraphs of the paper will explain the concepts of

    Distributed databases and will describe the principals behind Entity Relationship

    Diagram. The advantages and disadvantages of both items will be discussed in detail

    as well as descriptions for each of the item.

    History

    Databases have been used since the time when electronic computing has started.

    Around 1970s, the Distributed Database concept was introduced and since then a

    variety of different organisations worldwide uses them for data storage. Around the

    same time the Entity Relationship Diagram was first introduced by Charles Bachman.

    ERDs are used for different databases designs and can be served as foundation for

    database development and planning.

    Distributed Databases

    Database represents a collection of different data that is

    stored on the computerised system. Data is stored,

    created, organised and sorted, manipulated and

    retrieved by using different software programs or

    Database Management System (DBMS) and variety of

    query languages, such as SQL.

    Distributed Database is a database that stores data in the different locations on the

    network, which can be located in different geographical locations and is controlled by

    DBMS and allows multiple users to access and manipulate data without interfering

    with each other. In another words, although the data is spread across, the user sees

    database as centralised system with data stored in one place.

  • 29 DATA-NERD Issue 1 November 2008

    Data is spread across by using fragments that allow multiple re-creations of the same

    data. Different forms of data distribution can be used to spread data across.

    Data can be replicated, where the copies of the

    same data are kept in many different locations.

    Data can also be Horizontally or Vertically

    Fragmented. With Horizontal fragmentation, the

    data is distributed across different sites, whether

    with Vertical fragmentation the data is split by the

    columns across multiple systems. Sometimes data

    can be reorganised or in another words data is

    manipulated in some way, for example

    summarised and then stored. And the last method

    to data distribution is known as Separate Schema, in which the data is kept in

    different databases in order to facilitate different systems to access and use data with

    help of different programs and interfaces.

    Data in Distributed Database is regularly

    synchronised in order to ensure that all of the data

    is up-to-date. Data synchronisation is done by

    using timestamps. Every time the data in the

    database is created or updated, a timestamp is

    recorded with the date and time of that update, the system then uses timestamps to

    see whether the data was modified from previous time by comparing timestamps, and

    updates data if required.

    Distributed Database is designed in such way where the user sees the database as

    centralized system, rather than a system with data circulated across multiple

    locations. Although Distributed database has very complex design, it can be costly to

    create and needs very high security requirements, it has many benefits. Those

    benefits include reduced network traffic, as server or network is not used for most of

    the database activities, improved data manipulation time, reliability and availability.

  • 30 DATA-NERD Issue 1 November 2008

    3.4 ERDs for Distributed Databases

    Entity Relationship Diagram or ERD is used to graphically represent entities (tables

    or objects) of database and the

    relationships between these entities. ERD

    shows data flows and interactions between

    different objects, which are linked together

    by using unique identifiers or primary

    keys.

    Each entity in ERD represents an object of

    some kind, e.g. student or person, who is

    accompanied with its attributes, for

    example ID, Name, Date of Birth,

    Address, etc. The entities interact with

    each other by using relationships, e.g. student is assigned to the group. Sometimes the

    relationship defines the number of entities with which the object interacts, e.g. many

    students can be assigned to one group.

    ERDs are easy to use, create and are good as communication tool. ERD can be used

    as the foundation for the database design and structure. It is important, as it represents

    the structure and behaviour of the system or user requirements. It can be used as

    elements for planning and development processes. Although ERD can be weak tool

    for representing specifications and data descriptions and even can cause a loss of

    information, it has an advantage over other methods of database structure

    representations, as it comes in a graphical form. This allows people without any

    specific technical skills to understand how database works. This is very useful

    characteristic, as database design can be very complex and difficult to understand.

  • 31 DATA-NERD Issue 1 November 2008

    Giammarco Schisani

    19th of October 2008

    ERD Puzzle Fill in the blanks By: Giammarco Schisani

    Instructions Given the following description of an Entity Relationship Diagram, fill in the blanks in the Puzzle below.

    Entity Relationship Diagrams A relational 10 can be modelled using a 7 Relationship Diagrams (or ER Diagrams). Such diagrams are capable of

    describing the main components of an Entity Relationship 6: entities and 2.

    An entity describes something that can be uniquely identified, such as:

    An 12 in an e-13 website;

    A customer in an e-commerce 9;

    A product in an e-commerce website;

    A 11 of products in an e-commerce website (e.g. Monitors, Printers, etc.).

    Entities can often be described by a 4 (e.g. order, customer, etc.).

    In an ER 5down, an entity is described with a box:

    Order

    A relationship describes how two or more entities relate to each other. Relationships can often be described by a 8. For

    example:

    Places: A customer places an order;

    In an ER diagram, a relationship is described by a 5across:

    Places

    Customer

    Order

  • 32 DATA-NERD Issue 1 November 2008

    Both entities and relations can have attributes. An attribute represents information about the entity or relationship. For

    example:

    An order entity might have an ID 1, that uniquely identifies the order;

    A Customer entity might have Name and Surname attributes;

    A Places relationship between a Customer and an Order entity might have a Date attribute indicating when the order has been placed.

    In an ER Diagram, an attribute is represented by an

    :

    1

    7

    9

    3

    2

    4

    5

    6

    11

    10

    8

    12

    13

    Places

    Customer

    Order

    Firstname Surname

    ID Date

    See Page 26 for Solution

  • 33 DATA-NERD Issue 1 November 2008

    Puzzle 1: Against all odds By Paraic Lavin

    You work in a small company as a database administrator earning lots of money. These tables below

    (A, B & C) have been designed by three different colleagues who work in another division. Their boss

    has asked you to check them in order to prevent future problems, efficiency, etc. Can you spot the odd

    table out?

    Table A

    Figure 1.

    Did you know? #1

    Data should be

    presented in table

    format.

    Table B

    Figure 2.

    Did you know? #2

    Data should be

    accessible without

    ambiguity.

    Table C

    Figure 3.

    Did you know? #3

    INSERT, DELETE,

    UPDATE commands

    must be supported by

    use of a single

    command.

  • 34 DATA-NERD Issue 1 November 2008

    Puzzle 2: Deleting for good not for evil

    Puzzle 2A The Adventures of Dataman

    You are Dataman, a superhero with a penchant for whiskey and who recognises bad design as evil

    in database tables. Can you remove one column from the following table in Figure 4 so that removing

    the column converts the table into first normal form (1NF) and save the word from evil yet again?

    Table D

    Figure 5.

    Did you know? #5

    Physical changes to

    the data store should

    not affect the logical

    database structure.

    Puzzle 2B - Dataman Returns

    Al-primary-key-da have attacked western financial markets by introducing bad design into one critical

    database table. Governments across the world have said they will guarantee all affected tables but the

    public fears that it is not enough. Can you delete one column and save the world yet again from

    financial ruin?

    Table E

    Figure 6.

    Table F

    Figure 7.

    Did you know? #6

    Constraints must

    exist to preserve data

    integrity.

    Did you know ? #7

    Codd's 12 rules are

    really 13 rules

    because they are

    numbered 0 to 12.

    Answers:

  • 35 DATA-NERD Issue 1 November 2008

    Puzzle 2: Deleting for good not for evil

    Puzzle 1 Against all odds: The answer is Table A. Although none of the tables are fully normalised Table A is clearly not

    normalised at all as it has repeating information i.e. Class_1, Class_2, Class_3. Should two of these

    columns be deleted in favour of one Class column the table would be in 1NF First Normal Form.

    Puzzle 2: Deleting for good: Puzzle 2A Delete column FavColour or FavColour2. Either answer is correct.

    Puzzle 2B Delete column CustomerName from Table E as this information is duplicated in Table F.

  • 36 DATA-NERD Issue 1 November 2008

    The Need for Speed - War of The fields

    By Patrick Crowe

    In this edition of DATA-Nerd we take the chance to get out of the class-room and

    take a couple of laps under the clock. In this practical I examine if the theory

    regarding the correct definition of database fields is really required for

    performance and if it is required does it make a real difference out in the real

    world.

    Objective

    To examine the difference in performance between two databases identical in all

    respects except the field type for one column was declared as INT in one database

    and NVCHAR in the second. The column in question was used to contain numbers only.

    The Test

    All operations were executed using queries in MS SQL Server Management Express. The results were

    obtained using the Client Statistics functionality in the same application

    The DATABASES

    DATABASE Speed_Test

    DATABASE Speed_Test2

    Column Name Data Type ALLOW Nulls

    Column Name Data Type ALLOW Nulls

    NUMBER_INT Int Checked

    NUMBER_nchar nchar(100) Checked

    Letter nchar(10) Checked

    Letter nchar(10) Checked

    WOTW Text Checked

    WOTW Text Checked

    The databases contained 535294 rows after population

    TEST 1 BULK INSERT

    To test the Bulk Import speed from a

    The data was imported from a Comma Separated (CSV) Text file using the following :

    BULK INSERT Test_Table FROM 'c:\test2.csv' WITH (FIELDTERMINATOR =

    ',')

    RESULTS

    Contents of Database

    Column Name CONTENT

    NUMBER_INT/NUMBER_nchar Number from 1 to 535294

    Letter A

    WOTW The first Paragraph from War of the Worlds by H.G. Wells 1898 (source: http://www.bartleby.com/1002/101.html) 230 words, 1331 characters.

  • 37

    TEST 2 Simple select

    The following select was used to return a rows of the

    database

    For Database: Speed_Test

    Select * from [Test_Table] Where Number_nchar > 0

    For Database: Speed_Test2

    Select * from [Test_Table] Where Number_nchar > 0

    RESULTS

    The test was run 4 times for each database and the results are in milliseconds

    Contents of Database

    TOTAL Execution Time(ms)

    DataBase

    Speed Test

    Speed Test2

    Difference

    DATA-NERD Issue 1 November 2008

    The following select was used to return a rows of the

    [Test_Table]

    [Test_Table]

    The test was run 4 times for each database and the results are in milliseconds

    Speed _Test (INT) Speed _Test2(nchar)

    242875 436437

    Test 1 Test 2 Test 3

    15734 15062 14156

    194406 213265 209062

    November 2008

    Difference

    193562

    Test 4 Average

    14750 14925.5

    244390 215280.8

    200355.3

  • 38

    Conclusions

    It is clear from the test results in this particular environment that the correct

    field has significant performance issues. As part of the overall design of a database care should be

    taken to numerals and Characters to help optimise performance.

    The Environment

    Hard ware

    Lenovo

    CPU

    Memory

    Disk Space (at start of Speed Test)

    Software

    Operating System

    Database

    9.00.1399.06

    Database Management

    Other Software

    (open but not in use during test)

    0 50000 100000

    1

    2

    3

    4

    DATA-NERD Issue 1 November 2008

    It is clear from the test results in this particular environment that the correct declaration of a numeric

    field has significant performance issues. As part of the overall design of a database care should be

    taken to numerals and Characters to help optimise performance.

    ThinkPad R61 T8100 @2.10 GHZ

    Core 2 Duo

    RAM 4GB

    142 GB, 84MB free

    Windows XP professional 2002 Service Pack 2

    Microsoft SQL Server 2005 standard Edition , Version

    MS SQL Server Management Express ,Version 9.00.2047.00

    MS EXCEL, Google Chrome

    100000 150000 200000 250000 300000

    November 2008

    declaration of a numeric

    field has significant performance issues. As part of the overall design of a database care should be

    9.00.2047.00

    300000

    Difference

    SpeedTest2

    SpeedTest

  • Psychic Meg is on hand to analyse the

    cosmos!

    HOROSCOPE

    ARIES TAURUS GEMINI

    The stars have aligned just for you. Now is the time to sell your collection on eBay. The recession hasnt hit your star sign just yet! Sell sell sell!

    This will be a deeply depressing week when you realise your database has way more friends than you do. Maybe now is a good time to step into the real world.

    Be careful what you wish for; it just might happen. Think BIG and BIG is what you will get. Hopefully this wont apply to your waistline but could be very advantageous in your career!

    CANCER LEO VIRGO Fail to plan and you could be planning to fail! Make sure your recovery and failover plans do work. This month could be tricky Be prepared!

    This is your future self! Dont give up on your time-travel research. Take the time to include people around you in formulating a plan. Others will appreciate it and recognise you as a team player.

    'My Precious' - Finishing your Germanic translation of the Lord of the Rings book will finally culminate 6 years worth of Friday and Saturday nights. Time to party!

    LIBRA SCORPIO SAGITTARIUS

    You are destined to meet the person of your dreams this week. Keep your distance however. Time to kick on-line dating into cyberspace. Things are not always as they seem!

    There is no spoon! Keep this phrase in mind this month as nothing is clear or set in stone just yet. Clarity will come next month. Swirling your cup will help mix the coffee, milk and sugar.

    Feeling paranoid that your car might be an Autobot? Dont fret; you arent losing your mind. It will need a service, so book it in soon.

    CAPRICORN AQUARIUS PISCES

    You will arrive in a strange universe where you still live in your parents house, Battle Star Galactica is no longer cool, and your mum still licks her thumb and uses it to wash dirt off your face. Do your best to survive until the next worm hole opens up then jump as if your life depended on it!

    Front page news - Your dreams of making Wonder Woman vs. Cat Woman into a movie will finally be realised. Keep the spandex-wearing stories to yourself though your plan of world domination must remain a secret. The world is not ready just yet!

    Abandon ship. Your robots have become self aware. All mayhem is about to break loose. You and your kind are the first to be integrated and soldered into the motherboard. Abort while you can!

    What the stars have in store for you!

  • Advertisement

    Want to Learn more?

    Check out www.comp.dit.ie for the full range of innovative, exciting and

    flexible industry focused full-time and part-time undergraduate and post

    graduate courses.

    Binder12DataCoverDataMagazineNotes

    Binder13Binder14