Stastic Data Flows

Embed Size (px)

Citation preview

  • 8/13/2019 Stastic Data Flows

    1/46

    Data flows from sources like..

    Village Level

    Mandal Level

    District Level

    State Level

    Region Level

    Country Level

    Global Level

    A database is a collection of records

    (or data files) combined and treated

    as a unit for information retrieval

    Data flows

    like flood

    water!!

    Database is

    like a

    Check

    dam!

  • 8/13/2019 Stastic Data Flows

    2/46

    StatisticalDatabases!

    We have statistical databases on

    various aspects like Food grains

    Blood Banks

    Tax Payers

    Agricultural Output Share Market and many more..

    The DATA should be converted intoinformation (reports) by applying

    Data Analysis Tools

  • 8/13/2019 Stastic Data Flows

    3/46

    Examining data for its relevance

    Preparation of tables

    Graphic display of information Estimating the unknown

    Example: Agricultural output by Crop-

    cutting experiments

    Establishing functional relationship

    between causes and effect

    Computing the Growth rates

    Understanding the Trends and making

    forecasts and many more!

    Preparing a document stating the

    methodology and interpreting the results

    What is Data

    Analysis?

    MakingFigures

    Speak

    (the truth!)

  • 8/13/2019 Stastic Data Flows

    4/46

    The Common and Old Method

    Physical counting of cases from data sheets

    Hand Calculations

    Reference to Statistical Books for formulae

    Bypassing complex calculations and

    reporting the easy-to-do things alone!How to do?

    The Contemporary Method

    Get data into the computer Use a statistical software

    Prepare document using a Word Processor

  • 8/13/2019 Stastic Data Flows

    5/46

  • 8/13/2019 Stastic Data Flows

    6/46

    A new health insurance schemeis

    introduced by a company for its

    employees

    The management wishes to know the

    reaction of its employeesto the new

    scheme

    Opinions were collected from 50employees on several aspects like

    Age, Gender, Marital Status,

    Education level, Presentarrangements for health check

    up, monthly incomeand Concept

    Rating.

    A survey

    on health

    insurance

  • 8/13/2019 Stastic Data Flows

    7/46

    A questionnaire has been designed and

    used for collecting data

    Opinions were sought on a five point

    scale (multiple choice-tick one only)

    Coding of responsesis as follows.

    Extremely interested 5

    Interested 4

    Indifferent 3

    Not interested 2

    Not at all interested 1

    Collection

    of data

    with

    suitable

    coding

  • 8/13/2019 Stastic Data Flows

    8/46

    Age(initially no coding )

    actual years

    Gender Male M

    Female F

    Marital Status

    Married M

    Single S

    Monthly income

    Less than Rs.1000 1 Rs.1000 to Rs.2999 2

    Rs.3000 to Rs.4999 3

    Rs.5000 & above 4

    Coding for

    personal

    factors

  • 8/13/2019 Stastic Data Flows

    9/46

    Education

    Below Higher Secondary 1

    Higher Secondary 2

    Graduation 3

    Post-graduation 4 Present Arrangement

    Private doctor-own expenses 1

    Government/Corporate Hospitals 2

    Partial reimbursement 3

    Full reimbursement 4

    Coding

    forpersonal

    factors

  • 8/13/2019 Stastic Data Flows

    10/46

    Analysis is based on the questions for

    which the data is expected to provide

    answers

    Analyzethe Data!

    Some questions

    Identify how many are interested in the

    new scheme and how many are either

    indifferent or not interested Cross tabulate them along Gender, Age,

    Education, marital status etc

    Is there any relationship between the

    income level and the type of response?

    Identify the factors influencing the

    adoption to new scheme?

    What else the data speaks!

  • 8/13/2019 Stastic Data Flows

    11/46

    Data Entry

    -The First Step

    Analysis with Software

    The Second Step

  • 8/13/2019 Stastic Data Flows

    12/46

    The data collected from the field contains

    filled-in questionnaires or sheets

    Each sheet must have a serial number

    The sheets should be converted into a data

    filefor use in computer

    We can probably dividethe work and makemore than one file and assign the work to

    Data Entry Operators

    The Data Entry Designshould be well

    planned and be common for all operators

    These data files can bepooled upif

    necessary to make aproject-data-file

    The

    physicalstructure

    of data

  • 8/13/2019 Stastic Data Flows

    13/46

    TAKING

    DATA FROM

    BOOK TO

    COMPUTER

    Data should be arranged as separate records one for each

    individual (entity)

    The data should be numeric for carrying out any analysis

    Names and other labels will not go in for analysis but can be

    used for reporting

    Suitable coding should be defined before entering data in the

    computer

  • 8/13/2019 Stastic Data Flows

    14/46

    There are many packages for data entry

    like..

    FoxPro Lotus

    MS-Excel

    MS-Access

    Oracle On-line formats

    Software

    for data

    entry and

    data

    analysis Packages for Statistical Analysis

    SPSS

    SAS

    MINITAB

    SYSTAT

  • 8/13/2019 Stastic Data Flows

    15/46

  • 8/13/2019 Stastic Data Flows

    16/46

    MAKING ADATA FILE

    Open Excel

    On the title bar of the Excel windowthe file name appears as MicrosoftExcel Book1

    It usually contains three sheetsnamed Sheet1,Sheet2 and Sheet3

    In Sheet1 start entering the datafrom cell A1

    Reserve the f i rst rowfor columnheadings like Sno, Age, Gender etc

    Key in the data row wise or columnwise (press ENTER key after eachentry)

    Save the file with a suitable name in

    a Folder meant for this project

  • 8/13/2019 Stastic Data Flows

    17/46

    A SAMPLEDATA SHEET

    File Name: FoodFolder: D:\Statman

  • 8/13/2019 Stastic Data Flows

    18/46

    NOT THE

    CORREC

    T STYLE

    OF DATA

    ENTRY

  • 8/13/2019 Stastic Data Flows

    19/46

    THE RIGHTWAY!

  • 8/13/2019 Stastic Data Flows

    20/46

    DATA SHEET OF HEALTH INSURANCE

  • 8/13/2019 Stastic Data Flows

    21/46

    ANALYTIC

    AL

    FEATURE

    S

    IN EXCEL

    Finding sums

    Data sorting and Filtering

    Making one dimension tables

    Cross tabulations

    Creating different types of graphs

    Making abstracts from worksheets

    Changing the styles of presenting data

    Linking Excel report to a document

  • 8/13/2019 Stastic Data Flows

    22/46

    SOME

    TIPS

    INDATA

    HANDLIN

    G

    Selecting a part of data

    Sorting

    Filtering

    Column width

    Cut, Copy & Paste

    Auto Fill

    Paste Special

    Freeze PanesExporting Excel data to Word

  • 8/13/2019 Stastic Data Flows

    23/46

    D T

    N LYSISP K

    A free package of simple statistical tools is available in Excel

    It is called Data Analysis Pak

    It provides for analyses like

    Summary statistics

    Comparison of groups

    Correlations

    Regression analysis

    Statistical tests of hypothesis

    ..and many more

  • 8/13/2019 Stastic Data Flows

    24/46

    Data

    Prepared

    In

    Word

    Table

    S NO NAME GENDER CAS TE ENGLIS H MATHS S CIENCE

    1 RAJA. M B SC 60 27 45

    2 ANITHA. R G SC 55 44 36

    3 NEELIMA. K G ST 46 54 654 SIVARAJAN. A B OC 35 47 28

    5 MUTHU. B G OC 20 46 35

    6 GOPAL.R B OC 54 50 45

    7 BEENA. A G BC 63 46 64

    8 ACHUTAN. S B BC 54 52 65

    9 PRADEEP.M B BC 35 40 54

    10 PERUMAL. S B OC 25 36 45

    11 VARADAN. D B OC 28 40 38

    12 DIVYA. T G BC 64 56 37

    13 VASUMATHI. D G BC 37 45 54

    14 ANDAL. B B SC 63 44 36

    15 JAYA. L G ST 56 52 63

    16 RAMAN. N B BC 45 48 54

    17 MUREGESH. M B ST 50 46 68

    18 GANESH. L B ST 35 38 65

    19 SASIKALA. R G BC 52 50 54

    20 VALLI. M G SC 41 55 58

    It is enough to copy the Word Table and

    Paste in Excel!

  • 8/13/2019 Stastic Data Flows

    25/46

    We have

    got it

    in Excel!

  • 8/13/2019 Stastic Data Flows

    26/46

    Can we

    make

    a table of

    counts(frequencies)

    from this

    data?

    Soft

    Skill

    WHY NOT ?

    USE

    PIVOT TABLESOPTION

  • 8/13/2019 Stastic Data Flows

    27/46

    Make

    Frequency

    Tables!

    You can make one-way and two-way frequency tables fromExcel sheet

    Use Data menu and select the Pivot Table and Chartsub

    menu

    Follow the Wizard steps

    You will get the required tables

    Skill

    Freq enc distrib tion of st dents

  • 8/13/2019 Stastic Data Flows

    28/46

    Frequency distribution of students

    by caste (one-way table)

    Count of SNO

    CASTE Total

    BC 7OC 5

    SC 4

    ST 4Grand Total 20

    Frequency distribution of students

  • 8/13/2019 Stastic Data Flows

    29/46

    Frequency distribution of students

    by Caste and Gender (two-way table)

    Count of SNO GENDER

    CASTE B G Grand Total

    BC 3 4 7

    OC 4 1 5SC 2 2 4

    ST 2 2 4

    Grand Total 11 9 20

    Can we do this with hand calculations if

    there are thousands of cases?

    Not impossible but difficult to do!

  • 8/13/2019 Stastic Data Flows

    30/46

    Can wemake a

    Frequency

    table withgiven class

    intervals?

    Soft

    Skill

    USE

    STATISTICAL

    FUNCTIONS

    CERTAINLY !

  • 8/13/2019 Stastic Data Flows

    31/46

    Built-in

    Functions

    In Excel

    ENGINEERING FUNCTIONS

  • 8/13/2019 Stastic Data Flows

    32/46

    Built-in

    Functions

    In Excel

    STATISTICAL FUNCTIONS

  • 8/13/2019 Stastic Data Flows

    33/46

    AQUIRE

    SKILL

    BY

    DOING

    DEMO FOLLOWS..

    M ki F T bl

  • 8/13/2019 Stastic Data Flows

    34/46

    Making a Frequency TableBody length (cm) of 120 fish

    16.7 12.6 15.1 13.4 16.7 17.7 14.6 18.0 15.8 14.8

    16.9 13.7 16.0 14.4 15.3 16.4 12.8 11.5 13.4 16.0

    14.3 18.3 18.3 16.6 13.2 17.5 16.9 15.2 14.0 17.7

    13.8 13.2 13.7 18.4 17.1 13.9 20.5 13.2 14.9 17.4

    16.9 15.0 17.2 14.5 13.6 16.6 13.0 17.9 18.8 17.915.3 18.9 14.8 16.0 18.5 13.3 19.2 16.2 14.4 17.8

    15.6 18.0 15.8 15.7 20.6 13.5 16.3 15.1 14.3 10.7

    15.6 15.4 12.6 15.4 17.2 15.1 14.1 13.1 15.4 13.5

    12.7 14.1 12.2 16.6 17.0 15.6 14.7 18.7 18.3 13.2

    19.5 14.3 16.2 15.9 16.8 15.3 17.3 13.1 12.3 17.0

    16.9 12.4 15.4 17.6 16.2 14.4 18.8 13.5 14.2 14.8

    12.9 13.5 15.1 14.2 15.3 14.8 15.2 14.4 16.1 18.2

    Prepare a frequency table using Excel

  • 8/13/2019 Stastic Data Flows

    35/46

    We use

    the Paste

    functionFREQUENCY

    min 10.7

    max 20.6

    range 9.9

    interval 2lower limit upper limit upper bound (BIN) freq

    10 12.0 11.9 2

    12 14.0 13.9 26

    14 16.0 15.9 43

    16 18.0 17.9 3118 20.0 19.9 16

    20 22.0 21.9 2

    Class freq

    10 - 12 2

    12 - 14 2614 -16 43

    16 - 18 31

    18 - 20 16

    20 - 22 2

    120

    Learn

    more by

    Do it yourself

  • 8/13/2019 Stastic Data Flows

    36/46

    You can

    also

    construct

    a

    Bar Chart

    Class freq

    10 - 12 2

    12 - 14 26

    14 -16 43

    16 - 18 3118 - 20 16

    20 - 22 2

    TOTAL 120

  • 8/13/2019 Stastic Data Flows

    37/46

    ADVANCED FEATURES

  • 8/13/2019 Stastic Data Flows

    38/46

    Data

    Analysis

    Pak

  • 8/13/2019 Stastic Data Flows

    39/46

    Data

    Analysis

    Pak

  • 8/13/2019 Stastic Data Flows

    40/46

    Thet-test

    Sugali Yanadi

    20.43 17.7

    22.51 21.4

    18.99 20.7

    20.49 19.3

    23.12 21

    25.63 17.9

    18.08 18.6

    20.63 18.5

    22.55 18.2

    22.43 20.3

    22.77

    23.23

    Body Mass Index of

    Tribal Groups

    Is the Average BMI

    Same for the two

    groups ?

  • 8/13/2019 Stastic Data Flows

    41/46

    t-testoutput

    Sugali YanadiMean 21.73833 19.36

    Variance 4.319215 1.898222

    Observations 12 10

    Pooled Variance 3.229768Hypothesized Mean Difference 0

    df 20

    t Stat 3.090767

    P(T

  • 8/13/2019 Stastic Data Flows

    42/46

    p-p Plot

  • 8/13/2019 Stastic Data Flows

    43/46

    WIDE RANGE OF APPLICATIONS

    Control chartsForecasting

    Curve fitting

    Solver for optimization

    College Admissions

    Evaluation of test scores & ranking

    and many more!

  • 8/13/2019 Stastic Data Flows

    44/46

    The best way of learning

    Excel is to work with Excel

  • 8/13/2019 Stastic Data Flows

    45/46

    Statistics Made Simple -

    Do it yourself on PC

    By

    K.V.S.Sarma

    Prentice Hall India

  • 8/13/2019 Stastic Data Flows

    46/46

    Thank you