25
MATHEM4 Basic Statistics 1 UNIT I. INTRODUCTION TO STATISTICS (4 hours) 1.1. Definition of Terms 0.30 1.2. Descriptive and Inferential 0.30 1.3. Types and Level of Measurements of Data 0.40 1.4. Population and Sample 0.50 1.5. Random Sampling Techniques 0.50 1.6. Determining Sample Size 1.00 1.7. Methods of Collection of Data 0.25 1.8. Frequency Distribution 0.50 1.9. Methods of Presenting Data 0.25 The word statistics is derived from the Latin word status (meaning “state”). Early uses of statistics involved compilations of data and graphs describing various aspects of a state or country. Statistical investigations and analyses of data fall into two broad categories: Statistics has two basic meanings: (1) The word refers to specific numbers such as this published result: “Twenty-three percent of people polled believed that there are too many polls.” (2) The second meaning refers to statistics as a method of analysis Definition: STATISTICS is a collection of methods of planning experiments or observations, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data (Triola, 2001). Definition: STATISTICS is a science that deals with the methods of collecting, organizing and summarizing the data in such a way that valid conclusions can be drawn from them (Khazanie, 1990). Definition: STATISTICS : a branch of mathematics that deals with the analysis

Unit 1 (New)

Embed Size (px)

Citation preview

Page 1: Unit 1 (New)

MATHEM4Basic Statistics

1

UNIT I. INTRODUCTION TO STATISTICS (4 hours) 1.1. Definition of Terms 0.301.2. Descriptive and Inferential 0.301.3. Types and Level of Measurements of Data 0.401.4. Population and Sample 0.501.5. Random Sampling Techniques 0.501.6. Determining Sample Size 1.001.7. Methods of Collection of Data 0.251.8. Frequency Distribution 0.501.9. Methods of Presenting Data 0.25

The word statistics is derived from the Latin word status (meaning “state”). Early uses of statistics involved compilations of data and graphs describing various aspects of a state or country.

Statistical investigations and analyses of data fall into two broad categories:

STATISTICS

(Collection, Organization, Summary,Presentation, Analysis and

Interpretation of Data)

DESCRIPTIVE

deals with processing data without attempting to draw any inferences/conclusions from them. It refers to the

representation of data in the form of tables, graphs and to the description of some characteristics of the data,

such as averages and deviations.

INFERENTIAL (INDUCTIVE)

is a scientific discipline concerned with developing and using mathematical tools to make forecasts and

inferences. Basic to the development and understanding of inferential/inductive statistics are the concepts of

probability theory.

Statistics has two basic meanings:(1) The word refers to specific numbers such as this published result: “Twenty-three percent of people polled

believed that there are too many polls.”(2) The second meaning refers to statistics as a method of analysis

Definition: STATISTICS is a collection of methods of planning experiments or observations, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data (Triola, 2001).

Definition: STATISTICS is a science that deals with the methods of collecting, organizing and summarizing the data in such a way that valid conclusions can be drawn from them (Khazanie, 1990).

Definition: STATISTICS : a branch of mathematics that deals with the analysis and interpretation of numerical data in terms of samples and populations (Microsoft® Encarta® 2008)

Page 2: Unit 1 (New)

MATHEM4Basic Statistics

2

TYPES OF DATA

Qualitative

Categorical or Attribute data

can be separated into different categories that are distinguished

by some nonnumeric characteristic

Quantitative

Consist of numbers representing counts or measurements

Discrete Data

result when the number of possible values is either finite

or “countable” number (Whole Numbers)

Continuous Data

Result from infinitely many possible values that correspond to some

continuous scale that covers a range of values without gaps, interruptions

or jumps

(measurements, time and money)

Parameter and Statistic

A parameter – is a numerical measurement describing some characteristic of a population (N)

Example:When Lincoln was first elected to the presidency, he received 39.82% of the 1,865,908 vote cast. If we consider the collection of all those votes to be the population being considered, then the 39.82% is a parameter, not a statistic.

A statistic – is a numerical measurement describing some characteristic of a sample (n)

Example:Based on a sample of 877 surveyed executives, it was found that 45% of them would not hire anyone whose job application contained a typographical error. The figure of 45% is a statistic because it is based on a sample, not the entire population of all executives.

THE NATURE OF DATA

Variable - any characteristic that is being studied. A variable can take different values for different individuals. Heights of people, grades on a test, time it takes for a bus to arrive at the bus stop, hair color and others are some examples of variables

Data – are observations that contain the values of variables (such as measurements, genders and survey responses) that have been collected.

Data – are information, often in the form of facts or figures obtained from experiments or surveys, used as a basis for making calculations or drawing conclusions (Microsoft Encarta 2006).

Page 3: Unit 1 (New)

MATHEM4Basic Statistics

3

QUALITATIVE

(1) The “little guy” is back in the stock market in a big way and, as it turns out, is not so likely to be a guy. Women have rushed to buy stocks during the past two years, more than men. They constitute 57% of all new shareholders (Time Magazine).

(2) Religious affiliations of college students

QUANTITATIVEDISCRETE CONTINUOUS

(1) The number of eggs that hens lay are discrete data because they represent counts

(2) The number of children born in a family

(1) The amounts of milk that cows produce are continuous data because they are measurements that can assume any value over a continuous span. During a given time interval, a cow might yield an amount of milk that can be any value between 0 gallons and 5 gallons. It would be possible to get 2.343115 gallons, because the cow is not restricted to the discrete amount of 0, 1, 2, 3, or 5 gallons.

(2) Daily dietary intake (mg) of selenium in wheat products

FOUR LEVELS OF MEASUREMENTAnother common way of classifying data is to use four levels of measurement: nominal, ordinal, interval, and ratio. In applying statistics to real problems, the level of measurement of the data is an important factor in determining which procedure to use. Never do computations and never use statistical methods with data that are NOT appropriate.

For example, it would not make sense to compute an average of social security numbers, because those numbers are data that are used for identification; they don’t represent measurements or counts of anything.

Page 4: Unit 1 (New)

MATHEM4Basic Statistics

4

(1) The NOMINAL LEVEL OF MEASUREMENT is characterized by data that consist of names, labels, or categories only. The data cannot be arranged in an ordering scheme (such as low or high)

Examples:

(a) Survey responses of yes, no, and undecided

(b) Blood types of people living in a certain community

4. RATIO

3. INTERVAL

2. ORDINAL

1. NOMINAL

Because nominal data lack any ordering or numerical significance, they cannot be used for calculations. Numbers are sometimes assigned to the different categories (especially when data are computerized), but these numbers have no real computational significance and any average calculated with them is meaningless

Page 5: Unit 1 (New)

MATHEM4Basic Statistics

5

(2) Data at the ORDINAL LEVEL OF MEASUREMENT can be arranged in some order, but differences between data values cannot be determined or are meaningless.

The following are examples of sample data at the ordinal level of measurement

Course Grades: A college professor assigns grades of A, B, C, D or F. These grades can be arranged in order, but we

can’t determine differences between the grades. Thus we know, for example, that A is higher than B (so there is an ordering), but we cannot subtract B from A (so the difference cannot be found)

Rankings: Based on several criteria, a magazine ranks cities according to their “livability.” Those rankings (first, second, third, and so on) determine an ordering. However, the differences between rankings are meaningless. For example, a difference of “second minus first” might suggest 2 – 1 = 1, but this difference of 1 is meaningless because it is not

exact quantity that can be compared to other such differences. The difference between the first city and the second city is not the same as the difference between the second city and the third city. Using the magazine rankings, the difference between Baguio City and Tagaytay City cannot be quantitatively compared to the difference between Metro Manila and Metro Cebu.

(3) The INTERVAL LEVEL OF MEASUREMENT is like the ordinal level, with the additional property that the difference between any two data values is meaningful. However, there is no natural zero starting point (where none of the quantity is present) and ratios of data values are not meaningful.

The following are examples of data at the interval level of measurement.

(4) The RATIO LEVEL OF MEASUREMENT is the highest level which is an interval level modified to include the natural zero starting point (where zero indicates that none of the quantity is present). For values at this level, differences and ratios are both meaningful.

The following are examples of data at the ratio level of measurement. Note the presence of the natural zero value, and the use of meaningful ratios of “twice” and “three times.”

Ordinal data provide information about relative comparisons, but not the magnitudes of the differences. They should not

Temperatures: Body temperatures of 98.20F and 98.60F. Those values are ordered, and we can determine their difference (often called the distance between the two values). However, there is no natural starting point. The value of 00F might seem like a starting point, but it is arbitrary and does not represent the total absence of heat. It is wrong to say that 500F is twice as hot as 250F. (Temperature readings on the Kelvin scale are at the ratio level of measurement; the scale has an absolute zero)

Years: The years 1000, 2000, 1776, and 1492. (Time did not begin in the year 0, so the year 0 is arbitrary instead of being a natural zero starting point.)

Weights: Weights (in carats) of diamond engagement rings (0 does represent no weight, and 4 carats is twice as heavy as 2 carats)

Prices: Prices of college textbooks (Php 0.00 does represent no cost, and a Php 900.00 book is three times as costly as a Php 300.00 book

Length: A tilapia 8 inches long is twice as long as a 4-inch tilapia.

Page 6: Unit 1 (New)

MATHEM4Basic Statistics

6

Exercise: The following data describe the different data associated with a state senator. For each data entry, indicate the corresponding level of measurement.

(1) The senator’s name is Carah Bao.(2) The senator is 58 years old.(3) The years in which the senator was elected to the senate are 1963, 1969, 1981, and 1994.(4) Her total taxable income last year was Php 278,317.19.(5) The senator sponsored a bill to protect water rights. Out of 1100 voters in her district, 400 hundred said they strongly favored the bill, 300 said they favored the bill, 200 said they were neutral, 150 said they did not favor the bill and 50 said they strongly did not favor the bill.(6) The senator is married now.(7) However, the senator was previously divorced in 1965 and again in 1982.(8) A leading news magazine claims the senator is ranked seventh for her voting record on bills regarding public education

Population (N), Sample (n), Data and Variables

One of the goals of a statistical investigation is to explore the characteristics of a large group of items on the basis of a few. Sometimes it is physically, economically, or for some other reason almost impossible to examine each item in a group under study. In such situation the only recourse is to examine a sub-collection of items from this group. In statistics we commonly use the terms population and sample.

Example 1: Suppose an ornithologist is interested in investigating migration patterns of birds in the Northern Hemisphere. Then all the birds in the Northern Hemisphere will represent the population of interest to him. His choice of the population restricts him, for it does not include birds that are native to Australia and do not migrate to the Northern Hemisphere.

Example 2:Every ten years the Bureau of Census conducts a survey of the entire population of the Philippines accounting for every person regarding sex, age, and other characteristics. The last such survey was carried out last 2000. In this case the entire population of the Philippines is the population in the statistical sense.

A population can be finite or infinite and is made up of study units

Definition:A population (N) is the complete collection of all elements (scores, people, measurements, and so on) to be studied. The collection is complete in the sense that it includes all subjects to be studied.

Page 7: Unit 1 (New)

MATHEM4Basic Statistics

7

PopulationStudy Unit

Target Population

The whole group of study units which we are interested in applying our inferences or conclusions

Study Population

The group of study units to which we can legitimately apply our inferences or conclusions

Page 8: Unit 1 (New)

MATHEM4Basic Statistics

8

The terms population and sample are relative. A collection that constitute a population in one context may well be a sample in another context.

For instance, if we wish to learn how people in Greater Manila Area (GMA) feel about a certain national issue, then all the residents of Greater Manila Area (GMA) would constitute the population of interest. However, assuming that Greater Manila Area (GMA) represents a cross section of the Philippine population, if we use the response from these residents to understand the feelings about the issue among all the Filipino residents, then the residents of Greater Manila Area (GMA) would represent a sample.

RANDOM SAMPLING TECHNIQUES

When we do have specific objective, and we want to collect the data and do the analysis that will help us to meet that objective, we typically get our data from two common sources: observational studies (such as polls) and experiments (such as using a treatment to improve hair growth).

In an observational study, we observe and measure specific characteristics, but we don’t attempt to modify the subjects being studied. In an experiment, we apply some treatment and then proceed to observe its effect on the subjects.

SAMPLE SIZE An important consideration in conducting research is the size of your sample. It must be large enough so that erratic behavior of very small samples will not produce misleading results. Repetition of a research or an experiment is called replication.A large sample is not necessarily a good sample. Although it is important to have a sample that is sufficiently large, it is more important to have a sample in which the elements have been chosen in an appropriate way, such as random selection.

Use a sample size large enough so that we can see the true nature of any effects or phenomena, and obtain the sample using an appropriate method, such as one based on randomness.

RANDOMIZATIONOne of the worst mistakes is to collect data in a way that is inappropriate. We cannot overstress this very important point:

Data carelessly collected may be so completely useless that no amount of statistical torturing can salvage them.

COMMON METHODS OF SAMPLING

In a random sample members of the population are selected in such a way that each has an equal chance of being selected. Sampling is a process or procedure which involves taking a part of a population, making observation on this representatives and the generalizing the findings to the bigger population. (Ary, Jacob and Razavieh, 1981).

Probability Sampling – is a random sampling technique that each element in a population has an equal chance of being selected.

Non-probability Sampling – is a non-random sampling technique that each element in a population has no equal chance of being selected.

Page 9: Unit 1 (New)

MATHEM4Basic Statistics

9

PROBABILITY SAMPLING

1.) Simple Random Sampling – entails all elements are given an equal chance of being included in the sample. No one from the population is excluded from the pool. This is implemented if the population is homogenous. 2.) Systematic Sampling – This is a sampling after every regular interval. This can be undertaken if the features of the population normally characterize what would be applied to simple random sampling. The key here is to select some starting point and then select every kth (such as every 50th) element in the population.

Determine the sample size (n)Determine the interval (I)

I = N/n

3.) Stratified Sampling – entails subdividing the population according to a certain characteristic, then selecting the samples from every subgroup or stratum. This is resorted to when it is important to get response per subgroup or stratum. It is useful if there is a need to differentiate the characteristics of a heterogeneous population and the elements are geographically concentrated in a given area.

4.) Cluster Sampling – entails random selection of groups in a population who could serve as the respondents of the study. This is best applied if it deals with population with homogeneous characteristics but geographically dispersed in different parts of the country. The population area is divided into sections (or clusters), then randomly select some of those clusters and choose all the members or elements from those selected clusters.

SAMPLING

TECHNIQUE

PROBABILITY SAMPLING

SIMPLE RANDOM

SAMPLING

FISH-BOWL TECHNIQUE

LOTTERY TECHNIQUE

TABLE OF RANDOM NUMBERS

SYSTEMATIC SAMPLING

STRATIFIED SAMPLING

CLUSTER SAMPLING

NON-PROBABILITY

SAMPLING

ACCIDENTAL / CONVENIENCE

SAMPLING

PURPOSIVE SAMPLING

QUOTA SAMPLING

SNOW-BALL SAMPLING

Page 10: Unit 1 (New)

MATHEM4Basic Statistics

10

SAMPLING STRATEGIES APPROPRIATE TO PARTICULAR FEATURES OF THE POPULATION

Personal Attributes Geographical Spread Sampling Strategies

HomogeneousConcentrated Simple Random or Systematic

Dispersed 1.) Cluster Sampling2.) Simple Random or Systematic

HeterogeneousConcentrated 1.) Stratified Sampling

2.) Simple Random or Systematic

Dispersed1.) Stratified2.) Cluster3.) Simple Random or Systematic

Determination of s Sample size (n) provided that the Population size (N) is known

Slovin’s Formula Lynch et. al Formula

n≥ N

1+Ne2

N = Population Sizen = sample sizee = margin of error (0.10, 0.05, or 0.01)

n≥NZ2 p (1−p)

Nd2+Z2 p (1−p)

Z = value of the normal variable for a reliability level Z = 1.645 (90% reliability in obtaining the sample size)) Z = 1.96 (95% reliability in obtaining the sample size) Z = 2.575 (99% reliability in obtaining the sample size)p = 0.50 (proportion of getting a good sample)(1 – p) = 0.50 (proportion of getting a poor sample)d = 0.01, 0.025, 0.05, or 0.10 (choice of sampling error) N = population sizen = sample size

NON-PROBABILITY SAMPLING

1.) Accidental/Convenience Sampling – Simply use results that are readily available or accessible. Usually the first person who comes along who typifies a unit of analysis serves as the respondent of the study.

2.) Purposive Sampling – Implemented with the researcher defining a criterion or set of criteria for determining the respondents of the study. It is the researcher’s judgment that becomes the basis for selecting an element or group that will serve as the unit of analysis. It is useful in qualitative or exploratory studies. The objective is not to have many respondents but to make sure that the person who would be interviewed will provide a wealth of information. The aim is not to quantify but to characterize an event being studied.

3.) Quota Sampling - Similar to stratified sampling except that the selection of the elements per stratum is done through the application of random sampling strategy. Quota sampling entails grouping elements according to certain characteristics and ensuring that each group is represented. Quota sampling is helpful if the sampling frame is not available per group or stratum. It refines the application of convenience sampling since there is conscious intent on the part of the researcher to view the probable differences of every stratum or group with regard to the critical variables of the study.

4.) Snowball or Referral Sampling – Involves having a respondent refers other people who are in a position to answer some of the questions of the researcher. This is a particularly helpful in the study of highly sensitive topics where the identity of respondents is difficult to divulge or may even be unknown to many. In other words, if the sampling frame cannot be provided and the topic has security implications, a researcher could obtain referrals from the first respondent to the other respondents who may be willing to talk.

Page 11: Unit 1 (New)

MATHEM4Basic Statistics

11

A sampling error is the difference between a sample result and the true population result; such an error results from chance sample fluctuations.

A nonsampling error occurs when the sample data are incorrectly collected, recorded, or analyzed (such as by selecting a biased sample, using a defective measurement instrument, or copying the data incorrectly).

METHODS OF PRESENTATION OF DATA

Statistical data collected should be arranged in such a manner that will allow a reader to distinguish their essential features. Depending on a type of information and the objectives of the person presenting the information, data may be presented using one or a combination of three forms: TEXTUAL, TABULAR, and GRAPHICAL.

TEXTUAL FORM – The textual or paragraph form is utilized when the data to be presented are purely qualitative or when very few numbers are involved. This method is, generally, not desirable when too many figures are involved as the reader may fail to grasp the significance of certain quantitative relationships, but it becomes an effective device when the objective is to call the reader’s attention to some data that require special emphasis.

Example:From a newspaper report, it was gathered that China has a population of 707 million, India has 505 million, US has 207 million, USSR (before the break-up) has 245 million, and Indonesia has 125 million. That more than half of the

world’s people, about 2.1 billion live in Asia, 456 million in Europe, 354 million in North America, 195 million in South America, and 20 million in Oceana. Shanghai has 10,820,000; Tokyo has 8,841,000; New York has 7,895,000; and Moscow has 7,050,000.

TABULAR FORM – A more effective device of presenting data because the data are presented in more concise and systematic manner. People who want to make some comparisons and draw relationships usually find tabular arrangement more convenient and understandable than the textual presentation. The data are presented through tables consisting of vertical columns and horizontal rows with headings describing these rows and columns.

Example:

Continent/Region Population Country Population Cities PopulationAsia 2,100,000,000 China

IndiaIndonesia

707,000,000505,000,000125,000,000

Shanghai

Tokyo

10,820,000

8,841,000North America 354,000,000 USA 207,000,000 New York 7,895,000Europe 465,000,000 USSR 245,000,000 Moscow 7,050,000South America 195,000,000Oceana 20,000,000

GRAPHICAL OR PICTORIAL FORM – Among the different methods of presenting data, the graph or chart is perhaps the most effective device for attracting people’s attention. Readers who look for comparisons and trends may skip statistical tables but may pause to examine graphs. Graph has a great advantage over tables because graph conveys quantitative values and compares more readily than tables.

Line Graph – A line graph is an effective device used to portray changes in values with respect to time. The categories or time periods are chronologically arranged on the horizontal axis and the relevant values are indicated on the vertical axis. Variations in the data are indicated by a series of line segments formed by joining consecutive points plotted above the categories or time periods.

Page 12: Unit 1 (New)

200620052004200320022001200019991998199719961995

Year

5000

4500

4000

3500

3000

2500

2000

1500

1000

5,0004,800

4,550

3,500

3,2003,1503,000

2,800

2,300

1,9001,750

1,500

Popul

ation

2215108754320

Average Number of Cigarette Smoke Per Day

25

20

15

10

5

0

Co

un

t

MATHEM4Basic Statistics

12

Page 13: Unit 1 (New)

Below Average(Mostly 75 - 80)

Average (Mostly81 - 86)

Good or AboveAverage (Mostly

87 - 92)

Excellent (Mostly93 - 99)

Academic Performance

30

25

20

15

10

5

0

Co

un

t

11

28

74

Below Average(Mostly 75 - 80)

Average (Mostly 81 – 86)

Good or AboveAverage (Mostly 87 – 92)

Excellent (Mostly 93 – 99)

302520151050

Count

11

28

7

4

A ca

d e m ic

P er

fo r m a n ce

MATHEM4Basic Statistics

13

Bar Graph – Categorical as well as chronological comparisons may be shown graphically be means of a bar graph. A graph essentially consists of bars or rectangles which are drawn either vertically or horizontally depending on the type of data and the purpose of comparisons.

Below Average(Mostly 75 - 80)

Average(Mostly 81 -

86)

Good or AboveAverage

(Mostly 87 - 92)

Excellent(Mostly 93 -

99)

Academic Performance

20

15

10

5

0

Co

un

t

Male

FemaleGender/Sex

Pie Chart or Pie Graph – The pie chart, like the component bar graph, is particularly appropriate for portraying the relative magnitudes of the component parts of a whole. It is constructed by dividing a circle (pie) into sectors, each sector having a size proportional to the percentage it represents.

Page 14: Unit 1 (New)

MATHEM4Basic Statistics

14

Excellent8%

Good/Above Aver-age14%

Average56%

Below Aver-age22%

Academic Performance

11

28

7

4

Below Average(Mostly 75 - 80)

Average (Mostly 81- 86)

Good or AboveAverage (Mostly 87- 92)

Excellent (Mostly 93- 99)

AcademicPerformance

Page 15: Unit 1 (New)

MATHEM4Basic Statistics

15

Below Average(Mostly 75 - 80)

Average (Mostly81 - 86)

Good or AboveAverage (Mostly

87 - 92)

Excellent (Mostly93 - 99)

Academic Performance

30

25

20

15

10

5

0

Co

un

t

Male

FemaleGender/Sex

Below Average(Mostly 75 - 80)

Average (Mostly 81 -86)

Good or AboveAverage (Mostly 87 -

92)

Excellent (Mostly 93 -99)

Ac

ad

em

ic P

erf

orm

an

ce

302520151050

Count

Male

FemaleGender/Sex

Component Bar Graph – A component bar graph is a graphic device used to show the relative sizes of the components that make up a total. Each bar, representing 100%, is subdivided so that the length of each part corresponds to the proportion or percentage of the total number of cases belonging to the category being represented by the bar.

Page 16: Unit 1 (New)

MATHEM4Basic Statistics

16

Pictograph or Pictogram – A device which is often used to dramatize the differences among few a quantities. It is an effective tool for attracting attention since it employs pictures or symbols which are normally drawn of the same size and in rows. Large figures are generally shown by the increasing quantity of pictures and not the size of a picture.

Page 17: Unit 1 (New)

MATHEM4Basic Statistics

17

Statistical Maps – Statistical maps are used to present quantitative data which describe or classify geographical areas.

Page 18: Unit 1 (New)

MATHEM4Basic Statistics

18

Summarizing Data with Frequency Tables

Frequency Distribution – A tabular arrangement of data showing its classification or grouping according to magnitude or size.

Variations of Frequency Distribution Table

When working with large data sets, it is generally helpful to organize and summarize the data by constructing a

Components of a Frequency Distribution

A frequency table lists classes (or categories) of values, along with frequencies (or counts) of the number of values that fall into each class. The frequency for a particular class is the number of original scores that fall into that class.

CLASS LIMITS – the end of numbers of a class. It is the highest (upper limit) and the lowest (lower limit) values that can go into each class.

Lower Class Limits – are the smallest numbers that can belong to the different classesUpper Class Limits – are the largest numbers that can belong to the different classes.

CLASS BOUNDARIES (EXACT LIMITS) – “true” class limits defined by lower and upper boundaries. Class boundaries are numbers used to separate classes, but without the gaps created by the class limits.

CLASS MARK or CLASS MIDPOINT – the average of the lower and upper limits or boundaries of each class.

CLASS INTERVAL WIDTH/SIZE (i) – the range of values used in defining a class. Simply the length or width of a class. It is the difference between two consecutive lower class limits or two consecutive lower class boundaries.

Page 19: Unit 1 (New)

MATHEM4Basic Statistics

19

Constructing Frequency TablesThe main reason for constructing a frequency table is to use it for constructing a graph that effectively shows the distribution of the data (for example histogram).

Guidelines:It is important to observe the following guidelines when constructing a frequency table.

(1) Be sure that the classes are mutually exclusive. In other words, each of the original values must belong to only one class.

(2) Include all classes, even if the frequency is zero.

(3) Try to use the same width for all classes. Sometimes open-ended intervals, such as “65 years or older,” are impossible to avoid.

(4) Select convenient number for class limits. Round up to use fewer decimal places or use numbers relevant to the situation.

(5) Use between 5 and 20 classes. Some instructors made use of the Sturges Rule in determining the number of classes. (K = 1 + 3.322)

Suggested Steps in Constructing a Frequency Distribution

1.) Array the given raw data in ascending or descending order if necessary.

2.) Examine the arrayed data and identify the highest value (HV) and the lowest value (LV)

3.) Determine the range using the formula: Range = Highest Value – Lowest Value R = HV - LV

4.) Determine the class interval width/size (i) using the formula:

i= RK

Classinterval= RangeTentative number of classes

Sturges Rule : K = 1 + 3.322 log n

Where: K = tentative number of classes to use n = total number of cases in an observation log = common logarithm (base 10)

5.) Sort the arrayed data into appropriate classes using convenient and easy to read class limits. Start the first class with a lower limit either equal to or a little bit less than the lowest observed value

6.) Set up the class boundaries if necessary7.) Count or tally the number of observations into the

appropriate class intervals.

Page 20: Unit 1 (New)

MATHEM4Basic Statistics

20

Example:In “Ages of Oscar-winning Best Actors and Actresses” (Mathematics Teacher magazine) by Richard Brown and Gretchen Davis, presents the results for recent winners from each category.

Actors: 32 37 36 32 51 53 33 61 35 45 55 6976 37 42 40 32 60 38 56 48 48 40 4362 43 42 44 41 56 39 46 31 47 45 60

Actresses: 50 44 35 8026 28 41 21 6138 49 33 7430 33 41 31 3541 42 37 26 3434 35 26 6160 34 24 30 3731 27 39 34

Age Tally Frequency21–25 || 2 26–30 ||||| 5 31–35 |||||-|||||-||||| 15 36–40 |||||-|||||-|||| 14 41–45 |||||-|||||-||| 13 46–50 |||||-|| 7 51–55 ||| 3

56–60 ||| 3 61–65 |||||-|| 7 66–

70 071–75 | 176–80 || 2

--------------- n = 72

Standard Frequency Table

Relative Frequency Table

Cumulative Frequency Table

Age Frequency (f)

Age Frequency (f)

<cf <cf% >cf >cf%

21 – 25 26 – 30 31 – 35 36 – 40 41 – 45 46 – 50

251514137

21 – 25 26 – 30 31 – 35 36 – 40 41 – 45 46 – 50

2.7%6.94%

20.83%19.44%18.06%9.72%

2 2.78% 7 9.72% 22 30.56% 36 50.00% 49 68.06% 56 77.78%

72 100.00% 70 97.22% 65 90.28% 50 69.44% 36 50.00% 23 31.94%

Relative Frequency TableAn important variation of the basic frequency table uses relative frequencies (rf), which are easily found by dividing each class frequency by the total of all frequencies. A relative frequency table includes the same class limits as a frequency table, but relative frequencies are used instead of actual frequencies. The relative frequencies are often expressed as percents.

rf = class frequency

∑ of all frequencies(100 %)

Cumulative Frequency TableAnother variation of the standard frequency table is used when cumulative totals are desired. The cumulative frequency for a class is the sum of the frequencies for that class and all previous classes

Page 21: Unit 1 (New)

MATHEM4Basic Statistics

21

51 – 55 56 – 60 61 – 65 66 – 70 71 – 75 76 – 80

337012

N = 72

51 – 55 56 – 60 61 – 65 66 – 70 71 – 75 76 – 80

4.17%4.17%9.72%0.00%1.39%2.78%

100.00%

59 81.94% 62 86.11% 69 95.83% 69 95.83 70 97.22% 72 100.00%

16 22.22% 13 18.06% 10 13.89% 3 4.17% 3 4.17% 2 2.78%