Statistics and Probability - Power Point Slides

Embed Size (px)

Citation preview

  • 8/14/2019 Statistics and Probability - Power Point Slides

    1/64

    Virtual University of Pakistan

    Lecture No. 1Statistics and Probability

    Miss Saleha Naghmi Habibullah

  • 8/14/2019 Statistics and Probability - Power Point Slides

    2/64

    Objective

    To inculcate in you an attitude of Statisticaland Probabilistic thinking.

    To give you some very basic techniques inorder to apply Statistical analysis to real-world situations/problems.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    3/64

    That science which enables us to draw conclusions aboutvarious phenomena on the basis of real data collected onsample-basis

    A tool for data-based research Also known as Quantitative Analysis Any scientific enquiry in which you would like to base your

    conclusions and decisions on real-life data, you need toemploy statistical techniques!

    Now a days, in the developed countries of the world, there isan active movement for of Statistical Literacy.

    WHAT IS STATISTICS?

  • 8/14/2019 Statistics and Probability - Power Point Slides

    4/64

    Application Areas

    A lot of application in a wide variety ofdisciplines

    Agriculture, Anthropology, Astronomy,B i o l o g y, E c o n o m i c s , E n g i n e e r i n g ,Environment, Geology, Genetics, Medicine,Physics, Psychology, Sociology, Zoology .

    Vir tua l ly every s ing le sub jec t f rom Anthropology to Zoology . A to Z!

  • 8/14/2019 Statistics and Probability - Power Point Slides

    5/64

    DESCRIPTIVE STATISTICS

    STATISTICS

    INFERENTIAL STATISTICS

    THE NATURE OF DISCIPLINE

  • 8/14/2019 Statistics and Probability - Power Point Slides

    6/64

    The primary text-book for the course is Introduction to Statistical

    Theory (Sixth Edition) by Sher Muhammad Chaudhry and Shahid Kamalpublished by Ilmi Kitab Khana, Lahore. Reference books for the courseare:1. by Afzal Beg & Miraj Din Mirza. 2. by Mohammad Rauf Chaudhry (Polymer Publications, UrduBazar, Lahore).3. Statistics by James T. McClave & Frank H. Dietrich, II (DellenPublishing Company, California, U.S.A).4. Introducing Statistics by K.A. Yeomans (Penguin Books Ltd.,England).

    5. Applied Statistics by K.A. Yeomans (Penguin Books Ltd., England). 6. Business Statistics for Management & Economics by Wayne W.Daniel and James C. Terrell (Houghton Mifflin Company, U.S.A.).7. Basic Business Statistics by Berenson & Levine ( )

    Text and Reference Material

  • 8/14/2019 Statistics and Probability - Power Point Slides

    7/64

    IN ACCORDANCE WITH THE ABOVE-MENTIONED STRUCTURE,THE ORGANIZATION OF THIS COURSE IS AS FOLLOWS:

    WEEKS LEC-

    TURES

    AREA

    TO BE

    COVERED

    HOME-

    WORK

    ASSIGN-

    MENTS

    EXAMS

    1 TO 5 1 TO 15 DESCRIPTIVE

    STATISTICS 1 TO 5

    MID-TERM-

    I

    6 TO 10 16 TO 30 PROBABILITY 6 TO 10 MID-TERM-

    II

    11 TO 15 31 TO 45 INFERENTIAL

    STATISTICS 11 TO 15

    FINAL

    EXAM

    ORGANIZATION OF THISCOURSE

  • 8/14/2019 Statistics and Probability - Power Point Slides

    8/64

    Appreciate the nature of statistical data.Understand various methods of collectingstatistical data. Appreciate the importance of a proper samplingprocedure.Utilize various methods of summarizing anddescribing collected data.Employ statistical techniques to understand thenature of relationship between two quantitativevariables.

    Upon completion of the firstsegment, you will be able to:

  • 8/14/2019 Statistics and Probability - Power Point Slides

    9/64

    Understand the basic concepts of probability theory (which isthe foundation of statistical inference). Understand theconcept of discrete probability distributions and theirmathematical properties.Understand the concept of continuous probabilitydistributions and their mathematical properties.Get acquainted with some of the most commonly

    encountered and important discrete and continuousprobability distributions such as the binomial and the normaldistribution.

    Upon completion of the secondsegment, you will be able to:

  • 8/14/2019 Statistics and Probability - Power Point Slides

    10/64

    Understand and employ various techniques ofestimation and hypothesis-testing in order to draw

    reliable conclusions necessary for decision-makingin various fields of human activity.

    Through this segment, you will be able toappreciate the purpose and the goal of the subjectof Statistics.

    Upon completion of the thirdsegment, you will be able to:

  • 8/14/2019 Statistics and Probability - Power Point Slides

    11/64

    There will be two term exams and one finalexam. In addition, there will be 15 homeworkassignments. The final examination will becomprehensive in nature. (Approximately 25-30% of thefinal exam paper will be on the course covered upto theMid-Term-II Exam.)These will contribute the following percentages to thefinal grade:

    Mid-Term-I: 20%

    Mid-Term-II: 20%Final Exam: 30%

    Homework Assignments: 30%

    GRADING

  • 8/14/2019 Statistics and Probability - Power Point Slides

    12/64

    Meaning of Statistics

    Statistics

    Meanings

    STATUS

    Political

    State

    Information useful for the State

  • 8/14/2019 Statistics and Probability - Power Point Slides

    13/64

    Data are collected in many aspects of everyday life. Statements given to a police officer or physician or

    psychologist during an interview are data. The correct and incorrect answers given by a student on

    a final examination. Almost any athletic event produces data. The time required by a runner to complete a marathon, The number of errors committed by a baseball team in

    nine innings of play.

    EXAMPLES OF DATA

  • 8/14/2019 Statistics and Probability - Power Point Slides

    14/64

    EXAMPLES OF DATA And, of course, data are obtained in the course of

    scientific inquiry: The positions of artifacts and fossils in an archaeological

    site,

    The number of interactions between two members of ananimal colony during a period of observation,

    The spectral composition of light emitted by a star.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    15/64

    Types of Data

    Data

    Quantitative(Numeric)

    Qualitative(Non - Numeric)

  • 8/14/2019 Statistics and Probability - Power Point Slides

    16/64

    Variable

    A quantity that, varies from an individual toindividual.

    Variable

    Quantitative(Numeric) Qualitative(Non - Numeric)

  • 8/14/2019 Statistics and Probability - Power Point Slides

    17/64

    In statistics, an observation often means any sortof numerical recording of information, whether it is aphysical measurement such as height or weight; aclassification such as heads or tails, or an answer to aquestion such as yes or no.Variable:

    A characteristic that varies with an individual or anobject, is called a variable .For example, age is a variable as it varies from person to

    person. A variable can assume a number of values. Thegiven set of all possible values from which the variabletakes on a value is called its Domain. If for a givenproblem, the domain of a variable contains only onevalue, then the variable is referred to as a constant .

    OBSERVATIONS AND VARIABLES

  • 8/14/2019 Statistics and Probability - Power Point Slides

    18/64

    Variables may be classified into quantitative and

    qualitative according to the form of the characteristic ofinterest.

    A variable is called a quantitative variable when a

    characteristic can be expressed numerically such as age,weight, income or number of children.On the other hand, if the characteristic is non-

    numerical such as education, sex, eye-colour, quality,intelligence, poverty, satisfaction, etc. the variable is referredto as a qualitative variable . A qualitative characteristic is alsocalled an attribute .

    An individual or an object with such a characteristiccan be counted or enumerated after having been assigned to

    one of the several mutually exclusive classes or categories.

    QUANTITATIVE & QUALITATIVE VARIABLES

  • 8/14/2019 Statistics and Probability - Power Point Slides

    19/64

    Variable

    Variable

    Quantitative(Numeric)

    Qualitative(Non - Numeric)

    Continuous Discrete

  • 8/14/2019 Statistics and Probability - Power Point Slides

    20/64

    Continuous Variable

    Continuous Variable

    Measurement

    Height, Weight etc

  • 8/14/2019 Statistics and Probability - Power Point Slides

    21/64

    Discrete Variable

    Discrete Variable

    Counting

    e.g. No. of sisters

    Gaps, Jumps

  • 8/14/2019 Statistics and Probability - Power Point Slides

    22/64

    A quantitative variable may be classified as discrete orcontinuous. A discrete variable is one that can take only a discreteset of integers or whole numbers, that is, the values are taken by

    jumps or breaks. A discrete variable represents count data such asthe number of persons in a family, the number of rooms in a house,the number of deaths in an accident, the income of an individual, etc.

    A variable is called a continuous variable if it can take on anyvalue-fractional or integral within a given interval, i.e. its domain isan interval with all possible values without gaps. A continuousvariable represents measurement data such as the age of a person,the height of a plant, the weight of a commodity, the temperature at aplace, etc.

    A variable whether countable or measurable, is generallydenoted by some symbol such as X or Y and Xi or Xj represents theith or jth value of the variable. The subscript i or j is replaced by a

    number such as 1,2,3, when referred to a particular value.

    DISCRETE AND CONTINUOUS VARIABLES:

  • 8/14/2019 Statistics and Probability - Power Point Slides

    23/64

    Measurement Scales

    Measurement Scales

    Nominal ScaleOrdinal Scale

    Interval Scale Ratio Scale

  • 8/14/2019 Statistics and Probability - Power Point Slides

    24/64

    By measurement , we usually mean the assigning of number toobservations or objects and scaling is a process of measuring. The fourscales of measurements are briefly mentioned below:

    NOMINAL SCALEThe classification or grouping of the observations into mutually

    exclusive qualitative categories or classes is said to constitute a nominalscale . For example, students are classified as male and female. Number 1and 2 may also be used to identify these two categories. Similarly, rainfallmay be classified as heavy moderate and light. We may use number 1, 2

    and 3 to denote the three classes of rainfall. The numbers when they areused only to identify the categories of the given scale, carry no numericalsignificance and there is no particular order for the grouping.

    MEASUREMENT SCALES

  • 8/14/2019 Statistics and Probability - Power Point Slides

    25/64

    MEASUREMENT SCALES (Cont.)

    ORDINAL OR RANKING SCALEIt includes the characteristic of a nominal scaleand in addition has the property of ordering or

    ranking of measurements. For example, theperformance of students (or players) is rated asexcellent, good fair or poor, etc. Number 1, 2, 3,4 etc. are also used to indicate ranks. The onlyrelation that holds between any pair ofcategories is that of greater than (or morepreferred).

  • 8/14/2019 Statistics and Probability - Power Point Slides

    26/64

    INTERVAL SCALE A measurement scale possessing a constant interval size

    (distance) but not a true zero point, is called an interval scale .Temperature measured on either the Celcius or the Fahrenheitscale is an outstanding example of interval scale because thesame difference exists between 20o C (68o F) and 30o C (86o F)

    as between 5o C (41o F) and 15o C (59o F). It cannot be saidthat a temperature of 40 degrees is twice as hot as atemperature of 20 degree, i.e. the ratio 40/20 has no meaning.The arithmetic operation of addition, subtraction, etc. aremeaningful.

    RATIO SCALEIt is a special kind of an interval scale where the sale of

    measurement has a true zero point as its origin. The ratio scaleis used to measure weight, volume, distance, money, etc. The,

    key to differentiating interval and ratio scale is that the zero pointis meaningful for ratio scale.

    MEASUREMENT SCALES (Cont.)

  • 8/14/2019 Statistics and Probability - Power Point Slides

    27/64

    Example

    C h e m i c a l a n d m a n u f a c t u r i n g p l a n t ssometimes discharge toxic-waste materialssuch as DDT into nearby rivers and streamsThese toxins can adversely affect the plantsand animals inhabiting the river and the riverbank.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    28/64

    A study of fish was conducted in the Tennessee

    River in Alabama and its three tributary creeks:Flint creek, Limestone creek and Spring creek.

    A total of 144 fish were captured, and the

    following variable measured for each one:

  • 8/14/2019 Statistics and Probability - Power Point Slides

    29/64

    1. River/Creek from where fish was captured2. Species of fish (Channel fish, Largemouth

    bass or smallmouth buffalo fish)3. Length of fish (Centimeters)

    4. Weight of fish (grams)5. DDT concentration in the bodily system of the

    fish (parts per million)

  • 8/14/2019 Statistics and Probability - Power Point Slides

    30/64

    Classify each of the five variables measuredas quantitative or qualitative .

    Also, identify the types of measurementscales for each of the five variables.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    31/64

    Solution

    The variables Length, weight and DDTconcentration are quantitative variablesbecause each is measured on a nominalscale (Length is centimeters, Weight isgrams and DDT in parts per million).

    All three of these variables are beingm e a s u r e d o n t h e R a t i o S c a l e .

  • 8/14/2019 Statistics and Probability - Power Point Slides

    32/64

    Rationale

    Whenever we speak about the weight of anobject, obviously, if our measuring instrumentreads zero, this means that the object beingmeasured has zero weight --- and, in this sense,the zero would be a true zero.

    An exactly similar argument holds for the length of

    an object.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    33/64

    As far as DDT concentration in the bodily

    system of the fish is concerned, obviously, ifthere is absolutely no DDT in the fish, thenthe DDT concentration reads zero --- and,this particular zero reading will be true zero.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    34/64

    As, explained above, the three variables

    length of fish, weight of fish and DDTconcentration in the bodily system of thefish are quantitative variables measures

    on the ratio scale.In contrast:

  • 8/14/2019 Statistics and Probability - Power Point Slides

    35/64

    Data on River/Creek from which the fishwere captured, and the species of fish are

    qualitative data.Both of these variables are measured onNominal Scale.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    36/64

    Rationale

    T h e r i v e r / c r e e k f r o m w h i c h t h e f i s hwere captured, and the species of fish arequalitative data because these can not be

    measured quantitatively, they can only beclassified into categories.(i.e. Channel fish, Largemouth bass orsmallmouth buffalo fish for the species and TennesseeRiver, Flint creek, Limestone creek and Springcreek)

  • 8/14/2019 Statistics and Probability - Power Point Slides

    37/64

    The Statistical methods for describing,reporting and analyzing data depend onthe type of data measured (i.e. whetherdata are quantitative or qualitative).

    ERRORS OF MEASUREMENT

  • 8/14/2019 Statistics and Probability - Power Point Slides

    38/64

    Experience has shown that a continuous variable can never bemeasured with perfect fineness because of certain habits and practices,methods of measurements, instruments used, etc. the measurements arethus always recorded correct to the nearest units and hence are of limitedaccuracy. The actual or true values are, however, assumed to exist. Forexample, if a students weight is recorded as 60 kg (correct to the nearestkilogram), his true weight in fact lies between 59.5 kg and 60.5 kg, whereasa weight recorded as 60.00 kg means the true weight is known to lie

    between 59.995 and 60.005 kg. Thus there is a difference, however small it

    may be between the measured value and the true value. This sort ofdeparture from the true value is technically known as the error ofmeasurement . In other words, if the observed value and the true value of avariable are denoted by x and x + respectively, then the difference (x + ) x, i.e. is the error. This error involves the unit of measurement of x and is

    therefore called an absolute error. An absolute error divided by the true valueis called the relative error. Thus the relative error, which when multiplied by100, is percentage error. These errors are independent of the units ofmeasurement of x. It ought to be noted that an error has both magnitudeand direction and that the word error in statistics does not mean mistakewhich is a chance inaccuracy.

    ERRORS OF MEASUREMENT

  • 8/14/2019 Statistics and Probability - Power Point Slides

    39/64

    Errors of Measurements

    Errors of Measurements

    Biased Errors

    Cumulative Errors

    Systematic Errors

    Random Errors

    Compensating Errors

    Accidental Errors

  • 8/14/2019 Statistics and Probability - Power Point Slides

    40/64

    An error is said to be biased when the observed value isconsistently and constantly higher or lower than the true value.Biased errors arise from the personal limitations of the observer,the imperfection in the instruments used or some other conditionswhich control the measurements. These errors are not revealed by

    repeating the measurements. They are cumulative in nature, thatis, the greater the number of measurements, the greater would bethe magnitude of error. They are thus more troublesome. Theseerrors are also called cumulative or systematic errors .

    An error, on the other hand, is said to be unbiased when the

    deviations, i.e. the excesses and defects, from the true value tendto occur equally often. Unbiased errors and revealed whenmeasurements are repeated and they tend to cancel out in the longrun. These errors are therefore compensating and are also knownas random errors or accidental errors .

    BIASED AND RANDOM ERRORS

  • 8/14/2019 Statistics and Probability - Power Point Slides

    41/64

    Statistical Inference

    A Statistical Inference in an estimate orprediction or some other generalizationabout a population based on information

    contained in sample.

    That is, we use information contained insample to learn about the larger population.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    42/64

    Population and Sample

    Population:The collection of all individuals, items or

    data under consideration in a statisticalstudy.Sample:

    That part of the population from whichinformation is collected.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    43/64

  • 8/14/2019 Statistics and Probability - Power Point Slides

    44/64

    Five Elements of an InferencialStatistical Problem: A population One or more variables of interest

    A sample An Inference A measure of Reliability

  • 8/14/2019 Statistics and Probability - Power Point Slides

    45/64

    In order of understand the concept of

    Reliability, a very important point to beunderstood is that making an inferenceabout population from the sample is onlypart of the story.We also need to know its reliability --- that is,how good our inference is.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    46/64

    Measure of Reliability

    A measure of reliability is a statement(usually quantified) about the degree ofuncertainty associated with a statisticalinference.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    47/64

    The point to be noted is that the only way we

    can be certain that an inference aboutpopulation is correct is to include the entirepopulation in our sample.

    However, because of resource constraints,(i.e. Insufficient time and/ or money). Weusually can not work with wholepopulation, so we base our inference on

    just a portion of population (i.e. Sample)

  • 8/14/2019 Statistics and Probability - Power Point Slides

    48/64

    Consequently, whenever possible, it isimportant to determine and report thereliability of each inference made.

    As such, reliability is the fifth element ofstatistical inferencial problems.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    49/64

    Example

    A large paint retailer has had numerouscomplaints from customers about under-filled paint cans.

    As, a result retailer has begun inspectingincoming shipments of paint fromsuppliers.

    Shipments with under-filled problems will besent back to supplier.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    50/64

    A recent shipment contained 2,440 gallon-size cans.

    The retailer sampled 50 cans and weightedeach on a scale capable of measuringweight to four decimal places.

    Properly filled cans weigh 10 pounds.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    51/64

    a) Describe a populationb) Describe a variable of interestc) Describe a sampled) Describe the Inferencee) Describe a measure of uncertainty of our

    inference.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    52/64

    Solution

    a) The population is the set of units ofinterests to the retailer, which is theshipment of 2,440 cans of paint.

    b) The weight of paint cans is the variable,the retailer wishes to evaluate.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    53/64

    c) The sample is the subset of population.In this case, it is the 50 cans of paintselected by the retailer.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    54/64

    d) The inference of interes t invo lves thegeneralization of the information contained inthe sample of paint cans to the population of

    paint cans.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    55/64

    In particular, Retailer wants to learn aboutthe content of under-filled problem (if any)In the population.This might be accomplished by finding theaverage weight of the cans in the sample,

    and using it to estimate the average weightof the cans of population.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    56/64

    e) As far as the measure of reliability of ourinference is concerned, the point to benoted is that, using statistical methods,we can determine a bound on theestimation error.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    57/64

    Bound on the Estimation Error

    This bound is simply a number that ourestimation error (i.e. the difference betweenthe average weight of sample and averageweight of population of cans) is not likely toexceed.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    58/64

  • 8/14/2019 Statistics and Probability - Power Point Slides

    59/64

    When the weights of 50 paint cans are usedto estimate the average weight of all thecans, the estimate will not exactly mirror theentire population.

    For Example:

  • 8/14/2019 Statistics and Probability - Power Point Slides

    60/64

    If the sample of 50 cans yields a meanweight of 9 pounds, it does not follow (nor isit likely) that the mean weight of populationof can is also exactly 9 pounds.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    61/64

    Nevertheless, we can use sound statisticalreasoning to ensure that our samplingprocedure will generate estimate that isalmost certainly within a specified limit of thetrue mean weight of all the cans.

  • 8/14/2019 Statistics and Probability - Power Point Slides

    62/64

    For example such reasoning might assure us that

    the estimate of the population from the sample isalmost certainly within 1 pound of the actualpopulation mean.The implication is that the actual mean weight of

    the entire population of the cans is between9 1=8 pounds and 9 +1=10 pounds --- that is,(9 1) pounds.This interval represents the a measure of reliabilityfor the inference.

    IN TODAYS LECTURE

  • 8/14/2019 Statistics and Probability - Power Point Slides

    63/64

    IN TODAY S LECTURE,YOU LEARNT:

    The nature of the science of Statistics The importance of Statistics in various

    fields Some technical concepts such as

    The meaning of data Various types of variables Various types of measurement scales The concept of errors of measurement

    IN THE NEXT LECTURE

  • 8/14/2019 Statistics and Probability - Power Point Slides

    64/64

    IN THE NEXT LECTURE,YOU WILL LEARN:

    Concept of sampling Random verses non-random sampling Simple random sampling A brief introduction to other types of random sampling

    Methods of data collectionIn other words, you will begin your journey in a

    subject with reference to which it has been saidthat statistical thinking will one day be asnecessary for efficient citizenship as the ability toread and write.