37
Statistics and Fundementals

CHAPTER 1 Statistics and Fundementals

Embed Size (px)

DESCRIPTION

Statistic

Citation preview

  • Statistics and Fundementals

  • Statistics are used to describe our data but also assess what reliance we can place on information based on samples. A variable is any concept that we can measure and that varies between individuals or cases. Variables should be identified as nominal (also known as category, categorical and qualitative) variables or score (also known as numerical) variables. Formal measurement theory holds that there are more types of variable nominal, ordinal, interval and ratio. These are generally unimportant in the actual practice of doing statistical analyses. It is difficult to distinguish ordinal, interval and ratio measurement in practice in psychology. Nominal variables consist of just named categories whereas score variables are measured in the form of a numerical scale which indicates the quantity of the variable.

  • Imagine a world in which everything is the same; People are identical in all respects. They wear identical clothes; they eat the same meals; they are all the same height from birth; they all go to the same school with identical teachers, identical lessons and iden-

    tical facilities; they all go on holiday in the same month; they all do the same job; they all live in identical houses; and the sun shines every day.

    People are all have same sex and their gardens have the same plants and the soil is exactly the same no matter whose garden; they all die on their 75th birthdays and are all buried in the same wooden boxes in identical plots of land. They are all equally clever and they all have identical personalities. Their genetic make-up never varies. Mathematically speaking all of these characteristics are constants.

  • If no variation exist in the course of the life and the world seems less than realistic then we need statistics! In a richly varying world , statistics is essential. If nothing varies, then everything that is to be known about people could be guessed from information obtained from a single person. No problems would arise in generalising since what is true of lkay Baarr is true of everyone else theyre all called lkay Baarr after all. Fortunately, the world is not like that.

    Variability is an essential characteristic of life and the social world in which we exist. The sheer quantity of variability has to be controlled when trying to make statements about the real world.

    Statistics is largely makes the variability is comprehensible

  • We can give various definitions for statistics in a sense of its meaning is a discipline,

    Statistics aims to give the decision about related subject data by collecting, summarizing,

    organizing, analyzing and interpreting of data.

    Statistics covers all subjects related with numerical data which are encountered in our

    daily life. On the otherhand statistics aids to social and economical disciplines in

    investigations of the facts.

    To make the problems easy to understand and summarize them in organized and solve

    them in graphically make statistics a standart communication science for all sciences. So,

    Statistics is a standard communication science for all other sciences.

    Statistics, is branch of Applied Mathematics. So It can be applied even all sciences.

    Statistics is to extract information from data.

    5

  • 6

    Statistics is a way to get information from data

    Data Information

    Definitions: Oxford English Dictionary

    Statistics is a tool for creating new understanding from a set of numbers.

    Statistics

  • There are two types of statistics in application, Descriptive Statistics, decribes sets of data to

    summarize the information. It utilizes numerical and graphical methods.

    Inferential Statistics, utilizes sample data to make estimates, decisions, predictions etc..

    7

    Statistics, is the science of data. It involves techniques

    and methods for collecting, summarizing,

    organizing, analyzing and interpreting numerical

    information.

  • In all sciences, the main method for analyzing data

    scientifically can be summarized as follows.

    1. To observe the event being analyzed together with

    its objects and decribe them clearly.

    2. To generate fundamentals or rules about event after

    collecting, summarizing, organizing, analyzing and

    interpreting numerical information

    3. To inference for the future.

    4. To control the parameters of the event and to

    provide technical and methodical improvemets.

    8

  • Statistics applies the scientific steps upon data so gives aid for all sciences.

    Statistics, is applied on science for observing events and

    analyzing them at laboratories is used for the social sciences.

    in Economy, Psycology, Sociology, Demograpy etc.., and

    in State and in Business life as Healthy Services, Education,

    Production, Selling, Marketing, Finance, Economy,

    Advertisement and Sporting etc..

    9

  • It is the reality that we must not to expect miracles or expect

    %100 consistency from Statistics about events analyzed

    because of the unknown and uncontrollable parameters

    which are inevitable parts of events inspite of statistics

    makes interprets and inferences and directs us in right

    way.

    10

  • Subject/Occurrence/Event

    Data, Types of Data and Data sources, Time Series

    Variable

    Population and Sample

    Statistical Survey & Sample Survey, Census and Sampling

    Parameter

    Measurement and Scales

    Summation (Summing) Notation

    11

  • We can think of events as changes in objects or in

    relations among objects. It is the aim of the survey.

    There are two types of event. Typical Event: these are similar events , like physical and

    chemical.One explains all. Forexample fall down an object,

    heating water, etc.

    Collective Event :They are not similar.they may have common

    parts , like biology.

    They are most commonly related with live affairs. This kind of

    events research one by one . Forexample to buy a book

    bestseller, to watch a immigration of a specific kind of bird

    horde etc. 12

  • Sometimes elimination between two groups might be difficult.

    Because causes which effect the events may be different. These

    reasons are general and occasional. In general, statistics study

    with cumulative events. Forexample quality of soil and

    climate of land are general reasons for the harvest in

    agriculture facilities however quality of seed and agricultural

    methods are occasinal reasons.

    13

    Statistics investigates collective events generally.

  • They are observed values of a variable.

    They are numerical information about related subject.

    Singular form of DATA is called DATUM.

    Some data may not be described numerical but they can

    convey as numerical form by counting.

    There are two types of data.

    1. Numerical Data (Interval Data or Quantitative Data)

    2. Categorical Data (Nominal Data or Qualitative Data)

    14

  • 15

    We can classify data by kinds of

    grouping

    1_ Continuous Data

    It is any value within a given range of real numbers.Examples:WeightVoltageHeightSize of footTime TemperatureDistanceVelocity

    Discrete DataThey are produced by counting processingExamples:

    2_

  • Chapter 0 16

    Numerical Data (Interval or Quantitative Data) Discrete Data Continuous Data

    Ratio Data

    Categorical Data (Nominal or Qualitative Data)

    Grouped data

    Ordinal data

    We can classify data by kinds of

    grouping 1_ Continuous Data

    It is any value within a given range of real numbers.Examples:WeightVoltageHeightSize of footTime TemperatureDistanceVelocity

    Discrete DataThey are produced by counting processingExamples:

    2_

  • We can classify data by kinds of grouping

    Interval Data

    (Numerical or Quantitative Data)

    They includes two types of data,

    Nominal Data (Categorical or Qualitative Data)

    They are produced by responses that are

    belong to groups or categories.

    Discrete Data They are produced by

    counting processing Examples:

    Number of children Defects per hour Number of X firms`stocks Number of cars saled Number of articles for a material Heart beats per minute

    Continuous Data It is any value within a given

    range of real numbers. Examples:

    Weight Voltage Height Size of foot Time Temperature Distance Velocity

    Examples:

    Marital Status

    Gender

    Registration to vote

    Eye Color

    Religious

    Ordinal Data

    They are nominal data but their values are in order with respect to given codes.

    Examples:

    Student evaluation rating by grade (1:poor;2:good;3:excellent)

    Product Quality rating (1:poor;2:average;3:good)

    Size of T-shirts (S:small; M:Middle;L:Large;XL:ExtraLarge)

    Ratio data are continuous data where both differences and ratios are interpretable.

    Ratio data has a natural zero point.It is a meaningful zero point

    which allows for the interpretation of ratio comparisons.

    Examples:

    Time is an example of a ratio measurement scale. Not only can

    we say that difference between three hours and five hours is

    the same as the difference between eight hours and ten hours

    (equal intervals), but we can also say that ten hours is twice as

    long as five hours (a ratio comparison). 17

  • A time series is a sequence of observations which are ordered in time. If observations are made on some phenomenon throughout time, it is most sensible to display the data

    in the order in which they arise, particularly since successive observations will probably be dependent.

    21

    Examples for time series; 1. Economics: weekly share prices; monthly profits 2. Meteorology: daily rainfall; wind speed; temperature 3. Sociology: employment figures; number of patients applied to hospital in a day,

    Time series are best displayed in a scatter plot. The series value X is

    plotted on the vertical axis and time t on the horizontal axis. Time is

    called the independent variable .

    X : Time Y : Observations

  • Data Source is a specific data set, metadata set, database or metadata repository from where

    data or metadata are available.

    Data sources can be classified according to the

    Their survey purposes.

    If data are collected and prepared in firm then it is an Interior Kind Data but sourced from exterior

    then it is called Exterior Data.

    (X hospital Patients list treated in Psychology Dept.) is interor data for X hospital

    (Number of hospital beds in public and private inpatient institutions Ministry of Health) is

    exterior data for X Hospital.

    Its source and how it is handled.

    If the data are collected from population itself then it is called Primary Data otherwise is

    called Secondary Data.

    (X hospital Patients list treated in Psychology Dept.) is primary data for X hospital

    (Health statistics of Turkey Ministry of Health) is Secondary data for X hospital

    22

  • 23

    A variable is a characteristic of a population or a sample.

    The values of a variable are possible observations of the variable.

    They can change for each observation.

    Forexample,

    The mark on a statistics exam will vary from student to student.

    So, it is a variable.

    The price of a stock will change from day to day in stock market.

    So, it is a variable.

  • They are shown with letters like X,Y,Z,,, .

    They are used with indices to decribe instantaneous status of

    unit. X1, X2, X3, ... ,Xi

    Forexample, X explains the sales amount but X 3 is a

    subscripted variable and means the sales amount for 3rd.

    month of year.(March)

    24

  • Variables are classified according to data features which they are observed;

    25

    by attributes, by scaling, by observation

    by observation

    Dependent variables

    Independent variables

    Controllable Variables

    by scaling,Discrete variables

    Continious variables

    by attributes,Quantitative variables

    Qualitative variables

  • Variables are classified according to data features which they are observed by ;

    by attributes,

    Quantitative variables Variables that are measured in terms of numbers.

    (age,weight,height,speed and shoe size ...)

    Qualitative variables Variables that express a qualitative attribute (hair color, eye color,

    religion, favorite movie, gender,race,nation...)

    by scaling,

    Discrete variables Variable with possible scores of discrete points on the scale. (counting numbers, marital status, sexuality,...) Hint: A household could have three children or six children, but

    not 4.53 children. Continious variables Variable where the scale is continuous and not made up of discrete

    steps. (age,weight,intelligency level.,temperature,....) Hint :The response time could be 1.64

    seconds, or it could be 1.64237123922121 seconds.

    26

  • by observation

    Dependent variables They are answer of the question = (What I observe in experiment ? ) A variable

    which its value depends on the value of the independent variable. The independent variable is

    manipulated by the experimenter and its effects on the dependent variable are measured. The time for boiling of water is a dependent variable which depends to heat temperature

    or air pressure. The time for burning a candle out is dependent variable which depends on height of

    candle.

    Independent variables They are answer of the question = (What I change in experiment ?) A

    variable is manipulated by an experimenter . The heat and the pressure degrees which boil water are independent variables The height of candle is independent variable affect the time which of burning a candle is

    out.

    Controllable Variables They are answer of the question = (What I keep the same ?) They are

    quantities that a scientist wants to remain constant

    (In the experiment of burning a candle fast, we can use same type of candle and to keep the

    room windless.)

    27

  • A population is a set of measurements of units (Usually

    people, objects, transactions or events) that we are

    interested in studying.

    A single entity of population is called a unit.

    All students having been educated in Turkey is a populaion.

    Any student in this population is a person=a member=an

    entity of the Students having been educated in Turkey.

    Units are countable and measurable although colour and

    taste are not consider as a unit.

    Sometimes population is huge amount and it is impossible

    to count or measure it .

    28

  • Sample is a subset of the units of a population. If we have a

    population then we select a much small and controllable

    units as subset of population. This subset must be describe

    the population. By this way we get an understanable results

    for a population. But this way has got an error depens on

    selection of sample. Forexample; All university students is a

    population but some 100 students has got definite brand of

    GSM telephony selected randomly is a sample for the

    population All university students .

    29

  • 30

    Population Samples

    Sample _1

    Sample _2

    Sample _3

    Unit

  • A parameter is a numerical decriptive measure of a

    population. The mode of sex of childrens in a nursery is female.

    The average height of the students enrolled to the course STAT is 170 cm.

    Sample statistics is a numerical decriptive measure of a

    sample. It is calculated from the observations in the

    sample. The mode of sex of 15 students selected in the class is female.

    The average height of the students in STAT_Section-A is 170 cm

    31

  • Statistical Survey is a means of collecting data from a sample of that population and estimating their characteristics through the systematic use of statistical methodology. Sample Survey is a survey that includes elements of a sample Census (Population Survey) , a collection of data about every member of a population . Sampling (statistics), collecting data on only a sample of a population

    32

  • 33

    To Measure is the process we use to assign numbers to variables of individual population units. The values obtained after measurement are called Measurement. Scaling is the process of measuring with respect to quantitative attributes. Scales are some group of techniques which assign measurements are meaningful.

  • 1. Nominal(Categorical)Scale Objects are scaled according their definite attributes. Grouping company cars according their purpose to serve, grouping people

    according their jobs, grouping sport teams according to natinality, Gender, Ethnicity and Marital Status are scaled by this technique.

    To define counts, frequencies, maximum or minimum count are permissible.

    34

    Scales are techniques makes measurements are understandable.

    2. Ordinal(Rankable) Scale Objects are scaled according their some attributes. Scales defines more or less of

    attribute. The numerical scores which can be ordered from smallest to highest place.

    To order liquids according their densities, to order students in the class according to their heights, to order teams 1st, 2nd, and 3rd in a sport race.

    To find Median, % computing and to critize data greater than or not, are permissible.

  • 3. Interval Scale Scale with a fixed and defined interval. Intervals between adjacent scale values are

    equal with respect the the attribute being measured. Scaling devices developed for different systems, thermometers, some kind

    calendars, to define the positions of runners like Ali is finished race 5 seconds behind Veli, .

    Mean,Standart Deviation, Correlation computations are permissible.

    35

    Types of Scales

    4. Ratio Scale Intervals describe ratios of magnitudes. There is a rationale zero point for the scale. The examples given are Ali took the score half of the score of Veli, Team A won

    the twice of the score of Team B. Comparisions onto ratios is possible. Meter, kilogramme, degrees scales,.. All Statistical techniques are permissible.

  • Very often in statistics an algebraic expression of the form X1+X2+X3+...+XN is used in a formula to compute a statistic. (using indexes) The three dots ... in the preceding expression mean that there are many iterations about summation of variable Xs expressed as indexed. To write an expression like this way is very tedious often, so mathematicians have developed a shorthand notation to represent a sum of scores, called the summation notation.

    36

    28

    +2 +6 +9 +7 +4

    + + + + + + +

  • The expression is read, "the sum of X sub i from i equals 1 to N" It means "add up all the numbers." In the example set of five numbers, where N=5, the summation could be written:

    The "i=1" in the bottom of the summation notation tells where to begin the sequence of summation. If the expression were written with "i=3", the summation would start with the third number in the set. For example:

    37

  • The General Rule : DO THE ALGEBRAIC OPERATION AND THEN SUM.

    X Y X * Y

    5 6 30

    7 7 49

    7 8 56

    6 7 42

    8 8 64

    33 36 241

    38

    Example : Following data set is given. Find The sum of the product of the two

    variables X and Y.

    ... s true technique, true result.

    ... s wrong technique, wrong result.

  • 1. When the expression being summed contains a "+" or "-" at the highest level, then the summation sign may be taken inside the parentheses.

    Chapter 0 39

    2. The sum of a constant times a variable is equal to the constant times the sum of the variable.

    3. The sum of a constant is equal to N times the constant.

  • Problem : A survey data is given; 3, 4, 5, 8, 9. compute following sums. a) (X+2) = b) X2 = Solution : a) (X+2) = (3+2)+(4+2)+(5+2)+(8+2)+(9+2) = (5)+(6)+(7)+(10)+(11) = 39 b) X2 = (32)+(42)+(52)+(82)+(92) = (9)+(16)+(25)+(64)+(81) = 175

    40

    Problem : if two data set are given; For variable X : 3, 4, 5, 8, 9. For variable Y : 1, 5, 6, 7, 8. compute following sums. Solution : a) XY = (3*1)+(4*5)+(5*6)+(8*7)+(9*8) = (3)+(20)+(30)+(56)+(72) = 181 b) (X Y)2 = (3-1)2+(4-5)2+(5-6)2+(8-7)2+(9-8)2 = (4)+(1)+(1)+(1)+(1) = 8