Stat 2

Page # 1

Presentation of DataQ.1. What are different methods of presentation of Data?Ans. (i) Classification

(ii) Tabulation(iii) Diagrams(iv) Graphs.

Q.2. What is Classification?Ans. Classification is the process of arranging the data into relatively homogeneous groups or classes according to their

resemblances and affinities.

Q.3. What is Tabulation?Ans. The systematic arrangement of data in the form of rows and columns for the purpose of comparison and analysis is

known as tabulation.

Q.4. What is an array?Ans. Arrangement of data in ascending or descending order is called as an array.

Q.5. What are the main parts of the table.Ans. (i) Title

(ii) Box-head(iii) Stub(iv) Body(v) Prefatory Note(vi) Foot note(vii) Source Note.

Q.6. What is frequency distribution?Ans. A frequency distribution is a tabular arrangement of the data that shows the distribution of observations among different

classes.

Q.7. What are class limits?Ans. The class limits are defined as the values of the variables, which explain the classes.

Q.8. What do you mean by open-end class?Ans. If a frequency distribution has no lower class limit or no upper class limit of its any class is called an open-end class.

Q.9. What are class boundaries?Ans. The class boundaries are the exact values, which break up one class from another class.

Q.10. What do you mean by class marks (or mid points)?Ans. A class mark is average value of the lower and upper class limits or class boundaries.

Q.11. What is class interval?Ans. The difference between the upper and lower class boundaries is called class interval or class width.

Q.12. What is class frequency?Ans. The number of values falling in a specified class is called class frequency or frequency.

Q.13. What is relative frequency?Ans. The frequency of a class divided by the total frequency is called relative frequency.

Q.14. What is Histogram?Ans. A histogram is a set of adjacent rectangles for a frequency distribution such that the area of each rectangle is

proportional to the corresponding class frequency.

Q.15. What is frequency polygon.Ans. A frequency polygon is a many-sided closed figure that represents a frequency distribution.

Q.16. How is a frequency polygon constructed?Ans. It is constructed by plotting the mid points and corresponding frequencies and then connecting them by straight line

segments.

Q.17. What is ogive?Ans. The cumulative frequency polygon is called ogive.

Page # 2

Q.18. What is chart?Ans. A chart is a device used for representing a simple statistical data in a simple, clear and effective manner.

Q.19. What is ungrouped data?Ans. The fresh data that have been collected for the first time are called ungrouped data.

Q.20. What do you mean by grouped data?Ans. When the ungrouped data are arranged according to classes or groups with their respective frequencies are called

grouped data.

Q.21. For which distribution the graph of the frequency distribution is bell-shaped?Ans. For symmetrical distribution the graph is bell-shaped.

Q.22. Name some graphs of frequency distribution.Ans. Histogram, polygon, frequency curve, ogive.

Q.23. Name some charts / diagrams.Ans. Simple bar diagram, sub-divided bar diagram, Multiple bar diagram, Pie chart etc.

Q.24. What is the mid point of class 20-24?Ans. Mid point is 22.

Q.25. Write the formula of angle of sector used in pie chart.Ans. Angle of sector =

************************

Page # 3

Example 1: The following data shows the number of children in different families of a small locality:1, 2, 4, 3, 0, 1, 2, 3, 1, 1, 0, 2, 1, 0, 2, 3, 0, 0, 1, 3.

Make a frequency distribution. Also find relative frequencies.Solution:

Range = Maximum value – Minimum value= 4 – 0 = 4

The number of children Tally The number of families (f) r.f =

01234

//////// //////////

56441

5/20 = 0.256/20 = 0.304/20 = 0.204/20 = 0.201/20 = 0.05

---- 20 1.00

Example 2: The following data shows the ages of 50 cancer patients admitted in Shaukat Khanum Memorial Cancer Hospital, Lahore:

48 29 39 32 54 33 44 36 38 31

46 30 20 44 47 39 42 35 33 47

31 35 34 42 41 42 43 35 32 35

43 36 37 45 46 41 25 27 26 40

38 41 44 47 45 45 52 43 44 43

Make a frequency distribution. Also find class boundaries and mid points.Solution:

The following steps are involved in constructing a frequency distribution.i) Range = Maximum value – Minimum value

= 54 – 20 = 34ii) Approximate number of classes

No. of classes = 1 + 3.322 logn = 1 + 3.322log(50)= 1 + 3.322 (1.6990)= 6.6066 7 (approximately)

iii) Width of class interval

h = (appr.)

iv) Group the entire data with an interval of 5 each and write down the classes in the first column under the heading “Ages”. Count the actual number falling in each interval putting a tally (/) in the proper interval for each value. Count the number of tallies for each interval and write down in the next column, these are frequencies denoted by f.

Ages(Class limits)

Tally f Class boundaries Mid point (X)

20 – 2425 – 2930 – 3435 – 3940 – 4445 – 4950 – 54

///////// /////// //// ///// //// //// //// //////

148111592

19.5 – 24.524.5 – 29.529.5 – 34.534.5 – 39.539.5 – 44.544.5 – 49.549.5 – 54.5

22273237424752

---- 50 ---- ----

Example 3: The following data shows the scores made by Pakistani cricketers against New Zeeland in one-day

match. Draw a simple bar chart of the following data:

Cricketers Inzmam Waseem Shahid Saeed Imran Razzaq

Scores 54 47 26 30 25 23

Page # 4

Solution:

Example 4: Following data about the production of wheat in different localities of the Punjab for years1987 to 1989.

Production in Kg (thousands)

Year 1987 1988 1989

Locality ILocality IILocality III

500600200

600700400

800700500

i) Make a multiple bar chartii) Make a component bar chartiii) Make a percentage bar chart.

Solution: i)

Production in Kg (thousands)

Year 1987 1988 1989

Locality ILocality IILocality III

500600200

600700400

800700500

ii)Production in Kg (thousands)

Year 1987 1988 1989Locality I 500 600 800

Page # 5

Locality IILocality III

600200

700400

700500

Total 1300 1700 2000

iii)

Percentage Production in Kg (thousands)

Year 1987 1988 1989

Locality I

Locality II

Locality III

Total 100 100 100

Example5: The data are available regarding total production of urea fertilizer and its use on different crops. Total production of urea is 200 (thousand Kg) and its consumption for different crops wheat, sugarcane, maize, and lentils is 75, 80, 30 and 15 (thousand Kg) respectively. Make an appropriate diagram to represent these data.Solution:

Crops Fertilizer (thousand Kg)Angle of sector

Wheat 75

Page # 6

Sugarcane 80

Maize 30

Lentils 15

Total 200 360

Example: 6: Make a histogram from the following data:Marks f86 – 9091 – 95

96 – 100101 – 105106 – 110111 – 115

6410631

Solution: Marks f Class boundaries86 – 9091 – 95

96 – 100101 – 105106 – 110111 – 115

6410631

85.5 – 90.590.5 – 95.5

95.5 – 100.5100.5 – 105.5105.5 – 110.5110.5 – 115.5

30 ----

Page # 7

Measures of Central Tendency

Q.1. Define Average. What are its important types?Ans. Average

An average is a single value, which represents the data. Averages are also called “measures of central tendency” or “measures of location.” Types of Averages

The important types of averages are given below:(i) Arithmetic Mean or Mean(ii) Median(iii) Mode.

Q.2. Write down the properties of good average.Ans. The properties of a good average are given below:

(i) It should be clearly defined.(ii) It should be easy to calculate.(iii) It should be simple to understand.(iv) It should be based on all the observations.(v) It should not be affected by extreme values.(vi) It should be capable of mathematical treatment.

Q.3. Define Arithmetic Mean. Ans. Arithmetic Mean (A.M.)

Arithmetic mean or simply mean is defined as the sum of all the values in a data divided by number of values.Let X1, X2, X3,....…Xn be n observations, then arithmetic mean denoted by ( ) is given:

= =

Q.4. Write down the properties of Arithmetic Mean.Ans. The important properties of arithmetic mean are given below:(i) The sum of deviations of observations from their mean is zero. i.e.,

(X ) = 0 for ungrouped dataf(X ) = 0 for grouped data

(ii) The mean of a constant is constant itself i.e., If X = a then = a(iii) The arithmetic mean is affected by change of origin and scale. It means that if we add or subtract a constant from all the

values or multiply or divide all the values by a constant, the mean is affected by the respective change. i.e.,If Y = X a, = aIf Y = a bX, = a b

If Y = , = where a 0

(iv) The sum of squares of the deviations of the observations from their mean is minimum i.e., (X - )2 is minimum.(v) If n1, values have mean , n2 values have mean and so on nk values have mean , then the mean of all values is

called as combined mean. It is denoted by =

X or is given below: = =

Q.5. Define weighted arithmetic mean. In what circumstances is it preferred to ordinary mean and why?Ans. Weighted Arithmetic Mean

Sometimes all observations are not of equal importance. To show the importance of every observations we assign to it a value called weight. If n observation X1, X2, ----- Xn have the respective weights W1, W2 -------, Wn, then weighted arithmetic mean denoted by is obtained as

= = When all the values in the data are not of equal importance, it is preferred to ordinary mean because it gives relative

importance to all the values.Q.6. Define Median.

Page # 8

Ans. Median:Median is defined as the central value of the arranged data. It is a positional average denoted by ~

X

~X = th value for ungrouped data and grouped data (discrete)

~X = l + for grouped data (continuous)

Median class or Median group =

l = lower class boundary of the median class.h = class interval of the median class.

f = frequency of the median class.C = cumulative frequency of the class preceding to the median class.

Q.7. Define Mode. Write down its methods of calculation.

Ans. Mode Mode is defined as the most frequent value of the data. It is denoted by .

Methods of Calculation of Mode(i) Mode (for ungrouped data):

In ungrouped data mode is found by inspection. For example, the mode of 2,8,7,3,9,3 is 3.(ii) Mode for frequency distribution (Discrete):

The value corresponding to the maximum frequency.(iii) Mode for frequency distribution (Continuous):

= l + h

fm = frequency of the modal class.l = lower class boundary of the modal class.

f1 = frequency preceding the modal class.f2 = frequency following the modal class.h = class interval or width of the model class.

Q.8. Define harmonic mean?Ans. Harmonic mean is defined as the reciprocal of the mean of the reciprocals of the observations

Q.9. Define G.M.Ans. The geometric mean is defined as the nth root of the product of n positive values. Q.10 (a) What do you mean by unimodal, bimodal, multimodal distributions?

(b) When it is not possible to find mode?Ans. (a) Unimodal Distribution: A distribution having a single mode is called unimodal distribution.Bimodal Distribution:A distribution having two modes is called bimodal distribution.Multimodal Distribution:A distribution having more than two modes is called multimodal distribution. (b) If each value occurs the same number of times, then it is not possible to find mode.

TOPIC UNGROUPED GROUPED DATA

Arithmetic Mean

Direct = =

Deviation = A + = A +

Step deviation / Coding

= A + u =

= A + h

Geometric MeanG.M= = antilog

G.M = = antilog

Page # 9

Harmonic MeanH.M = H.M =

Median~

Y = The value of ~

Y = l +

QuartilesQK=The Value of Kth itemk = 1,2,3

QK = l +

Mode Observation Method/Inspection Method

^Y = l +

Note: The formulae of median, quartiles, deciles and percentiles for discrete frequency distribution are same as that of ungrouped data.

Weighted Arithmetic Mean:_

Yw = For symmetrical distribution:

Mean = Median = ModeFor skewed Distribution:

Mode = 3 Median – 2 MeanExample 1: Find arithmetic mean of the following data:

102, 104, 106, 108, 110.(i) By direct method (ii) By short-cut method

Solution:

X D= X – A(X –100)102 2104 4106 6108 8110 10

X= 530 D= 30Arithmetic mean:(i) Direct Method (ii) Short-cut Method

Example 2: Find average age from the following frequency distribution of ages of 50 patientsAges

No. of patients20-24 125-29 430-34 835-39 1140-44 1545-49 950-54 2

Solution:Ages f X fX

20-24 1 22 2225-29 4 27 10830-34 8 32 256

Page # 10

35-39 11 37 40740-44 15 42 63045-49 9 47 42350-54 2 52 104 50 ----- 1950

Average Age:

=

= = 39Hence the average age of patients is 39 years.

Example 3:Find the arithmetic mean from the given information:(i) D = X– 39, ΣD = 240 and n = 10

(ii) u = , Σu = 23 and n = 20(iii) X = 10 + 5u, Σfu = - 46 and n = 125

Solution: (i) ΣD = 240, n = 10

D = X – 39, Comparing with D = X – A, A = 39

Arithmetic mean = = A + = 39 + = 39 + 24 = 63

(ii) Σu = 23 n = 20

u = , Comparing with u =

A = 57, h = 5

Arithmetic mean = = A +

= 57 + 5 = 57 + 5.75 = 62.75

(iii) X = 10 + 5u, Σfu = – 46, n = 125X – 10 = 5u

u = , Comparing with u =

A = 10, h = 5, n = f = 125

Arithmetic mean = = A +

= 10 + 10 + (– 1.84)= 10 – 1.84 = 8.16

Example 4: Calculate the weighted man of the following data:

Items Expenditure Weight

Food 290 7.5

Rent 54 2.0

Clothing 98 1.5

Fuel 75 1.0

Miscellaneous 75 0.5

Solution:

Items X W WX

Food 290 7.5 2175

Rent 54 2.0 108

Page # 11

Clothing 98 1.5 147

Fuel 75 1.0 75

Miscellaneous 75 0.5 37.5

----- 12.5 2542.5

Weighted mean:

=

= = 203.4

Example 5: The ungrouped data is given below:.

45, 30, 35, 40, 44, 32, 42, 37

Calculate geometric mean using:

i) Basic definition ii) log formula

Solution:

i) G.M using basic definition

G.M =

= = 37.76

ii) Using log formulaY Log Y4530354044324237

1.65321.47711.54411.60211.64341.50511.62321.5682

12.6164

G.M. = antilog == antilog

= antilog 1.57705 = 37.76Example 6:Following data has obtained from a frequency distribution using

u = , Show that G.M is less than A.M.

u – 4 –3 –2 –1 0 1 2 3f 2 5 8 18 22 13 8 4

Solution: u ƒ ƒu Y = 2u + 136.5 logY ƒ log Y-4 2 -8 128.5 2.1089 4.2178-3 5 -15 130.5 2.1156 10.578-2 8 -16 132.5 2.1222 16.9776-1 18 -18 134.5 2.1287 38.31660 22 0 136.5 2.1351 46.97221 13 13 138.5 2.1414 27.83822 8 16 140.5 2.1477 17.18163 4 12 142.5 2.1538 8.6152

Page # 12

80 -16 ----- ----- 170.6972

u =

2u = Y – 136.5Y = 2u + 136.5, A = 136.5, h = 2

Arithmetic Mean

= A + h= 136.5 + = 136.5 + (–0.4)

= 136.5 – 0.4 =136.1

G.M = Antilog =Antilog =Antilog (2.1337) = 136.05

A.M = 136.1, G.M = 136.05It shows that G.M is less than A.M i.e.G.M < A.M

Example 7: Find harmonic mean for the following grouped data

Class boundaries0 – 4 4 – 8 8 – 12

12 – 16 16 – 20 20 – 24 24 – 28

2578741

Solution:

C.B f Y

0 – 4 4 – 8 8 – 12

12 – 16 16 – 20 20 – 24 24 – 28

2578741

261014182226

1.00000.83330.70000.57140.38890.18160.0385

34 ---- 3.7137

H.M =

= = 9.1553Example 8: Find median from the following data:

(i) c, a, b(ii) 88.03, 94.50, 95.05, 84.60(iii) 87,91,89,88,89,91,87,92,90,98.

Solution: (i) The data in an array:

a, b, c

Median = th value

= th value = 2rd value = b

(ii) The data in an array:Sr. No. 1 2 3 4 5Values 84.60 88.30 94.50 94.90 95.05

Page # 13

Here n = 5

Median = th value

= th value = 3rd value = 94.50

(iii) The data in an array:Sr. No. 1 2 3 4 5 6 7 8 9 10Values 87 87 88 89 89 90 91 91 92 98

Here n = 10,

Median = th value

= th value.

= (5.5)th value= =

Example 9: Find the median from the following data of heights of students:

C.1Frequency

86 – 9091 – 95

96 – 100101 – 105106 – 110111 – 115

6410631

Solution:C.I Class boundaries f c.f

86 – 9091 – 9596 – 100

101 – 105106 – 110111 – 115

85.5 – 90.590.5 – 95.5

95.5 – 100.5100.5 – 105.5105.5 – 110.5110.5 – 115.5

64

10631

61020262930

---- 30 ----

Median =

Median = 95.5 + (15-10) = 95.5 + 2.5 = 98.0

Example 10: Find mode for the following data:91, 89, 88, 87, 89, 91, 87, 92, 90, 98, 95, 97, 96, 100, 101, 96, 98, 99, 98, 100, 102, 99, 101, 105, 103, 107, 105, 106, 107, 112.Solution: Since the most frequent value of the data is 98.Therefore, Mode = 98 Example 11: Find mode for the following frequency distribution of heights of students:

HeightsFrequency

86 9091 95

96 100101 105106 110111 115

6410631

Page # 14

Solution:

Heights C.1Class boundaries f

86 9091 95

96 100101 105106 110111 115

86 – 9091 – 9596 – 100

101 – 105106 – 110111 – 115

85.5 – 90.590.5 – 95.5

95.5 – 100.5100.5 – 105.5105.5 – 110.5110.5 – 115.5

64

10631

Mode =

Sine maximum frequency is 10, therefore 95.5 – 100.5 is modal class.

Mode =

= 95.5+3.0 = 98.5

Measures of Dispersion

Q.1. What do you mean by dispersion?Ans. Dispersion means the variability of values about the measures of central tendency.

Q.2. What are important types of dispersion?Ans. (i) Range (ii) Quartile Deviation

(iii) Mean Deviation (iv) Standard Deviation(v) Variance.

Q.3. Differentiate between absolute dispersion and relative dispersion.Ans. An absolute dispersion is that type of dispersion in which measures of dispersion have the same units as those of original data. A

relative dispersion is that type of measures of dispersion, which is independent of unit of measurements.

Q.4. Define range.Ans. Range is defined as the difference between maximum and minimum values of the data.

Q.5. Define quartile deviation.Ans. It is defined as “Half of the difference between upper and lower quartiles”.

Q.6. When does range become zero?Ans. For constant observations, range is zero.

Q.7. Define mean deviation.Ans. It is defined as

“The arithmetic mean of the absolute values of the deviations from any average.

Q.8. Define variance.Ans. It is defined as “The arithmetic mean of squares of deviations of values from their mean.

Q.9. What will be variance of 3, 3, 3, 3, 3, 3, ?Ans. Zero.

Q.10. Define standard deviation.Ans. It is defined as :

“The positive square root of the arithmetic mean of squares of deviations from their mean.”

Q.11. If s.d. = 3, then what will be the variance?Ans. Variance will be 9.

Q.12. If S.D. of 2, 4, 6, 8, 10 is 2.83, then what will be S.D. of 102, 104, 106, 108, 110?Ans. The S.D. of 102, 104, 106, 108, 110 will be 2.83.

Page # 15

Q.13. Is variance affected by change of scale.Ans. Yes, variance is affected by the change of scale.

Q.14. Is variance of negative values negative?Ans. No, variance is always non-negative.

Q.15. What is the utility of standard deviation.Ans. It has great practical utility in sampling and statistical inference.

Q.16. What is the relationship between mean, median and mode for positively skewed and negatively skewed distribution.Ans. For positively skewed distribution

Mean > Median > ModeFor negatively skewed distribution Mean < Median < Mode

Topic Absolute DispersionRelative Dispersion

RangeR = Ym Yo Co-efficient of

Range=

\Quartile Deviation or Semi Inter Quartile Range

Q.D = Co-efficient of Q.D =

Mean Deviation

Ungrouped Grouped

M.D_Y =

M.D~Y =

M.D _

Y =

M.D ~

Y =

Standard Deviation

Co-efficient of S.D =

Variance S2 = (S.D)2 S2 = (S.D)2

Co efficient of Variation = 100

Co-efficient Measures of Skewness

Karl Pearson’s

S.K = S.K =

Bowley’s or QuartileS.K =

Symmetrical Distribution:S.K = 0Mean = Median = Mode

Positively Skewed Distribution:S.K > 0

Page # 16

Mean > Mode > Median

Negatively Skewed Distribution:S.K < 0Mean < Mode < Median

Example 1: Find range for each of the following data.i) 12, 6, 7, - 3, 15, 10, 18, 5& – 24 ii) 19, 3, 8, 9, 7, 8, 10, 12, 18 & 21iii) 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4

Solution:i) 12, 6, 7, - 3, 15, 10, 18, 5& – 24

Range = Ym – Y0

Maximum value = Ym = 18Minimum value = Y0 = – 24

Range = 18 – (–24) = 18 + 24 = 42ii) 19, 3, 8, 9, 7, 8, 10, 12, 18 & 21

Range = Ym – Y0

Maximum value = Ym = 21Minimum value = Y0 =3

Range = 21 – 3 = 18iii) 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4

Range = Ym – Y0

Maximum value = Ym = 4Minimum value = Y0 =4

Range = 4 – 4 = 0The range of constant is zero.

Example 2: Find range for the following frequency distribution.Groups ƒ

70-74 275-79 580-84 1285-89 1890-94 7

Also find coefficient of dispersion.Solution:

Range = Ym – Y0

Ym = Upper class boundary of the highest class = 95.5Y0 = Lower class boundary of the lowest = 69.5

Range = 94.5 – 69.5 = 25.0

Co-efficient of dispersion (Range) = = =

Example3: Find lower quartile, upper quartile & quartile deviation from the given data:Groups Frequency70-74 275-79 580-84 1285-89 1890-95 7

Solution:Groups Frequency C.B C.f70-74 2 69.5-74.5 275-79 5 74.5-79.5 7

Groups ƒ C.B70-74 2 69.5 – 74.575-79 5 74.5 – 79.580-84 12 79.5 – 84.585-89 18 84.5 – 89.590-95 7 89.5 – 94.5

Page # 17

80-84 12 79.5-84.5 1985-89 18 84.5-89.5 3790-95 7 89.5-94.5 44 44 ----- -----

Example 4: The ungrouped date is given below:2, 5, 6, 6, 8, 9, 12, 13, 16, 23

Calculate the average deviation from i) Mean ii) Median.

Solution:

Y= =

2 8 6.55 5 3.56 4 2.56 4 2.58 2 .59 1 .512 2 3.513 3 4.516 6 7.523 13 14.5

Y = 100 = 48 = 46

i) Average deviation (Mean) =

Mean =

Average deviation (Mean) = =

ii) Average deviation (Median) =

Page # 18

Average deviation (Median) = =

Example 5: Calculate median & mean deviation from the following data:Solution:

f C.B C.f2 9.25 – 9.75 2 1.57 3.145 9.75 – 10.25 7 1.07 5.35

12 10.25 – 10.75 19 0.57 6.8417 10.75 – 11.25 36 0.07 1.1914 11.25 – 11.75 50 0.43 6.026 11.75 – 12.25 56 0.93 5.583 12.25 – 12.75 59 1.43 4.291 12.75 – 13.25 60 1.93 1.93

60 ----- ----- ----- 34.34

M.D from Median:

Example 6: Calculate variance & standard deviation from the following data:102, 104, 106, 108, 110

Solution:

Y

102104106108110

-4-2024

1640416

530 0 40

,

= = 2.83

Example 7: Determine, mean S.D and C.V from the given data:

Page # 19

Ages Frequency

20-24 125-29 430-34 835-39 1140-44 1545-49 950-54 2

Solution: Ages f

YfY fY2

20-24 1 22 22 48425-29 4 27 108 291630-34 8 32 256 819235-39 11 37 407 1505940-44 15 42 630 2646045-49 9 47 423 1988150-54 2 52 104 5408 50 1950 78400

REGRESSION AND CORRELATIONScatter Diagram: The graphic representation of a set of “n” pairs of bivariate data is called scatter diagram or scatter plot.

In scatter diagram we take independent variable along the horizontal axis (xaxis) and the dependent variable along vertical axis (yaxis), the resulting set of points drawn on the graph paper. If a relationship between the Variables exists, then the points in the scatter diagram will show a tendency to cluster around a straight line or some curve. Such a line or curve around which the points cluster is called the regression line or regression curve which can be used to estimate the expected value of the random variable Y from the values of the nonrandom variable X. The scatter diagrams shown below show the relationship between two variables.

Page # 20

Regression: The dependence of one variable (dependent variable) on one or more other variables (independent variables) is called regression. When we study the dependence of a variable on a single independent variable, it is called simple regression or twovariable regression. When the dependence of a variable on two or more than two variables is studied, it is called multiple regression.Regressand: In regression process the dependent variable is called regressand. It is also called as the response variable or the predictand variable or the dependent variable or the explained variable.Regressor: In regression process the independent variable is called as the regressor. It is also called as the predictor variable or the independent variable or the controlled variable or the explanatory variable.Least Squares Principle: The principle of least squares states that the sum of squares of the residuals of observed values from their corresponding estimated values should be least.Properties of the Least Squares Line: Following are the important properties of the least squares regression line:(i) The sum of residuals between the observed the corresponding estimated values is always zero i.e.,

e = (y – ) = 0(ii) The sum of squares of the residuals e2 is minimum.(iii) The least squares regression line always passes through the point .(iv) It is the best line because a and b are the unbiased estimates of the parameters and .Correlation: The degree or strength of relationship (interdependence) between the variables is called correlation.

Examples of correlation; heights and weights of children, ages of husbands and ages of wives at the time of their marriages, marks of students in mathematics and in statistics etc.Product Moment Coefficient of Correlation: A numerical measure of strength in the linear relationship between any two variables is called the Pearson’s product moment correlation coefficient or coefficient of simple correlation.

The sample linear correlation for n pairs of observations is defined by

(i) Positive Correlation: If both the variables are moving in same direction (increase or decrease), then it is said to be positive or direct correlation. For example, ages and heights of children.(ii) Negative Correlation: If both the variables are moving in opposite direction it is called negative or inverse correlation. For example, increase in the supply of a commodity decreases its price.(iii) No Correlation: If the change in one variable does not effect the other variable, then there will be no correlation. For example, the head sizes and I.Q’s of persons.Properties of Coefficient of Correlation: The important properties of coefficient of correlation are given as follows:(i) The coefficient of correlation is symmetrical with respect to x and y, i.e.,

rxy = ryx

(ii) The correlation coefficient is a pure number i.e., it does not depend upon the unit of measurement.(iii) The correlation coefficient always lies between –1 and +1.(iv) The correlation coefficient is the geometric mean between the two regression coefficients i.e.,

r = +ve, if both byx and bxy are +ve.r = ve, if both byx and bxy are ve.

(v) The correlation coefficient is independent of origin and scale, i.e. ,rxy = ruv

Important Points & Formuale

Page # 21

Regression line of y on x is= a + bx or

= a + byxx (b = byx)

Regression line of y on x is= c + dy or

= c + bxyy (d = bxy )

byx =

= =

bxy =

= =

a = or a = c = or c =

Coefficient of Correlation

r =

=

=

Example 1 The following table shows the ages x and systolic blood pressures y of 12 women.Age (years) xi

56 42 72 36 63 47 55 49 38 42 68 60

Blood pressure yi

147 125 160 118 149 128 150 145 115 140 152 155

Fit a regression line of blood pressure on age. Estimate the expected blood pressure of a women whose age is 45 years. What is the change in blood pressure for a unit change in age.

Solution:

x y xy x2

564272366347554938426860

147125160118149128150145115140152155

82325250

115204248938760168250710543705880

103369300

313617645184129639692209302524011444176446243600

628 1684 89894 34416The estimated line of y on x is

= a + bxb =

=

Page # 22

a = Hence = 80.778 + 1.138 x

For x = 45; = 0.80.778 + 1.138(45) = 131.988 132Example2 The following table gives the number of persons employed and cloth manufactured in a textile mill.

Persons employed xi 137 209 113 189 176 200 219Cloth manufactured yi 23 47 22 40 39 51 49

Calculate the coefficient by using the above formula.Solution:

x y xy x2 y2

137209113189176200219

23472240395149

31519823248675606864

1020010731

18769436811276935721309764000047961

5292209484

1600152126012401

1243 271 50815 229877 11345The correlation co-efficient is

r =

=

=

Example 3 A random sample of 20 pairs of observations (xi, yi) gave the following:

Estimate the linear regression equation taking (i) X as independent variable (ii) Y as independent variable.Solution:

(i) Regression function taking x as independent is^y = a + bx

b =

=

a = = 8 – 0.84(2) = 6.32

Hence ^y = 6.32 + 0.84 x(ii) Regression function taking y as independent variable is

^x = c + dy

d =

= =

c = = 2 – 0.583(8) = 2.67

Hence ^x = 2.67 + 0.583 y

Page # 23

ANALYSIS OF TIME SERIES

Time Series: A time series consists of numerical data collected, observed or recorded at successive time periods.Examples of time series are; the hourly temperature recorded by weather bureau, the total monthly sales of pens in a

book shop, the annual rainfall at Murree etc.Analysis of Time Series: Analysis of time series is decomposition of a time series into its different components for separate study. The basic purpose of analysis of time series is to use it for forecasting.Signal: The systematic component of variation in time series is called signal.Noise: An irregular or random component of variation in time series is called noise.Historigram: The graph of a time series is called historigram. It is constructed by taking time along xaxis and the time series along yaxis. Using an appropriate scale, points are plotted, then these points are joined by line segments to get required historigram.Components of Time Series: Following are the main components of time series:(i) Secular trend (T)(ii) Seasonal variations (S)(iii) Cyclical movements (C)(iv) Irregular movements (I)(i) Secular Trend: A secular trend is a long term movement that indicates the general direction of the variation in a time series. It represents smooth, steady and gradual movement in a time series in the same direction.

Examples of secular trend are; a decline in death rate due to advances in science, a continually increasing demand for smaller automobiles etc.(ii) Seasonal Variations: The Seasonal variations are short term movements that indicate the identical changes in a time series during the corresponding seasons. The main causes of these variations are seasons, religious affairs and social customs. Examples of seasonal variations are; the increased sales of cotton cloths in summer, an after Eid sale in a departmental store, an increase in employment during summer etc.(iii) Cyclical Movements: Cyclical movements refer to the long term oscillations or swings about the trend line or curve since the movements take the form of upward and downward swings, they are also called “cycles”. The four phases of a business cycle are prosperity, recession, depression and revival, provide important example of cyclical movements.(iv) Irregular Movements: Irregular movements are unsystematic in nature. They occur in a completely unpredictable manner by chance, events such as war, floods, earthquakes, strikes, fires etc. These variations are also called accidental, residual or random variations. Examples of irregular movements are; a fire in a factory delaying in production for 3 weeks, rise in prices due to floods etc.Methods of measuring secular trend in a time series?(i) Free hand curve method(ii) Method of semi averages(iii) Method of moving averages(iv) Method of least squares

Important Points & Formulae

Page # 24

Coding of xOrigin at beginning

(x)Origin at Middle

Odd numbers Even numbersHalf unit One unit

0123....................................

.....3210123............

.....75311357......

.....3.52.51.50.50.51.52.53.5.....

* The equation of semi averages is= a + bx

where b = and a =

* The equation of linear trend is = a + bxNormal Equations are:

y = na + bxxy = ax + bx2

If x = 0 a =

b =

Examples

Example 1. Make a historigram from the following data:

Year 1962 1963 1964 1965 1966 1967Production (tons) 20 28 50 15 18 27

Solution:

Example 2. The following table shows the property damaged by road accidents in Punjab for the years 197379:

Year 1973 1974 1975 1976 1977 1978 1979

Property damaged 201 238 392 507 484 649 742

Page # 25

Find trend values by free hand curve method.Solution:

Year Property damaged Trend value (from graph)1973197419751976197719781979

201238392507484649742

187278369460551642733

Example 3. From the data given below:Year 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969Value 318 326 337 340 359 365 372 381 402 410

Obtain trend values using method of semi averages.Solution:

Year y Semi x = t – 1960

Trend value = 316 + 10xtotal average

1960

1961

1962

1963

1964

318

326

337

340

359

1680 = 336 =

0

1

2

3

4

= = 2

316

326

336

346

356

1965

1966

1967

1968

1969

365

372

381

402

410

1930 = 386 =

5

6

7

8

9

= = 7

366

376

386

396

406

The estimated equation of semi averages is = a + bx

b =

a =

Page # 26

= 336 – 10(2) = 336 – 20 = 316Hence = 316 + 10x Example 4. Use the method of semi average to find trend values for the following data showing net profit (in lacs of rupees) of SNGPL for the years 196472.

Year 1964 1965 1966 1967 1968 1969 1970 1971 1972Profit 33 86 116 95 101 128 146 110 32

Find the estimated profit in 1964.Solution:

Year y Semi x = 76.05 + 4.3xtotal average

1964

1965

1966

1967

33

86

116

95

330 = 82.5 =

0

1

2

3

= = 1.5

76.05

80.35

84.65

88.95

1968 101 4 93.251969

1970

1971

1972

128

146

110

32

416 = 104 =

5

6

7

8

= = 6.5

97.55

101.85

106.15

110.45

The estimated equation of semi averages is = a + bx

b =

a = = 82.5 – 4.3(1.5) = 82.5 – 6.45 = 76.05

Hence = 76.05 + 4.3xEstimated profit for the year 1994:

x = 1994 – 1964 = 30For x = 30; = 76.05 + 4.3 (30) = 205.05Example 5. Find

(i) 3year(ii) 5year moving averages for the following time series.

Year Value Year Value194819491950195119521953

202326292329

195419551956195719581959

312738343335

Solution:

3year moving 5year moving

Page # 27

Year Valuetotal average total average

194819491950195119521953195419551956195719581959

202326292329312738343335

6978788183879699

105102

23262627

26.672932333534

121130138139148159163167

24.226.027.227.829.631.832.633.4

Example 6. Find out 4year moving average (centred) for the given data.Year Production

(tons)Year Production

(tons)19481949195019511952

50.036.543.044.538.9

19531954195519561957

38.132.638.741.741.1

Solution:

Year Production 4year movingtotal average average (centred)

1948

1949

1950

1951

1952

1953

1954

1955

1956

1957

50

36.5

43.0

44.5

38.9

38.1

32.6

38.7

41.7

41.1

174.0

162.9

164.5

154.1

148.3

151.1

154.1

43.50

40.73

41.13

38.53

37.08

37.78

38.53

= 42.12

= 40.93

39.83

37.81

37.43

38.16

Page # 28

Example 7. The following data shows the production of steel in a mill for the years 19561964.

Year 1956 1957 1958 1959 1960 1961 1962 1963 1964

Production(000 tons)

60 65 80 73 97 105 93 111 117

(i) Fit the linear trend by the method of least squares by taking the origin at the middle. Also calculate the trend values.(ii) Predict the production of steel for the year 1965.

Solution:

Year y x xy x2 (Trend value) = 89 + 7.1x

195619571958195919601961196219631964

6065807397

10593

111117

–4–3–2–101234

–240–195–160–730

105186333468

16941014916

60.667.774.881.989.096.1

103.2110.3117.4

801 0 424 60 The least squares trend line is

= a + bx

a = b =

Hence = 89 + 7.1x

(ii) Prediction of the production of steel for the year 1965 isFor x = 5 ; = 89 + 7.1(5) = 124.5

Example 8. Fit a linear trend to the following data (take origin at the middle and half year unit).

Year 1991 1992 1993 1994 1995 1996Value 5 8 12 15 20 24

Also show that sum of residuals is equal to zero.Solution:

Year y x = xy x2 = 14 + 1.91x e = y

1991

1992

1993

1994

1995

1996

5

8

12

15

20

24

–5

–3

–1

1

3

5

–25

–24

–12

15

60

120

25

9

1

1

9

25

4.45

8.27

12.09

15.91

19.73

23.55

0.55

–0.27

–0.09

–0.91

0.27

0.45

Page # 29

84 0 134 70 0The least squares trend line is

= a + bxSince x = 0, therefore above equations reduce to:

a =

b =

Hence = 14 + 1.91xSince e = 0, which shows that sum of residuals is zero.

INDEX NUMBERSQ.1. What is index number?Ans. An index number is a device which measures the changes in a variable or group of related variables with respect to

time or space.

Q.2. What is simple index number?Ans. An index number is called simple if it measures a relative change in a single variable with respect to base.

Q.3. Give some examples of simple index number.Ans. Index number for wages of employees, index number of cotton prices in Sahiwal etc.

Q.4. What is composite index number?Ans. An index number is called composite index number if it measures a relative change in a group of related variables with

respect to base.

Q.5. What are the types of index number as regard to base?Ans. (i) Fixed base index

(ii) Chain base index.

Q.6. Define price relative.Ans. Price relative is the percentage ratio of the price in current year and the price in a base year.

Q.7. Define link relative.Ans. Link relative is the percentage ratio of the price in current year and the price in the preceding year.

Q.8. What is price index number?Ans. A price index number measures the changes in the whole sale or retail prices of a particular commodity or a number of

commodities with respect to base.

Q.9. What is quantity index number?Ans. A quantity index number measures the changes in the quantity or volume of goods produced or consumed.

Q.10. Define C.P.I.Ans. A consumer price index number measures the changes in prices of a specified basket of goods and services consumed

in the given period relative to the base period.

Q.12. What do you mean by “basket” of goods?Ans. The basket of goods and services will contain items like

Page # 30

(i) Food (ii) House rent (iii) Education (iv) Clothing (v) Misc.

Q.13. Write down the formula of C.P.I.Ans. (i) Pon 100 (Aggregate Expenditure Method)

(ii) Pon [Weighted Average of Relatives]

Q.14. Write the formula of price relative.Ans. I = 100

Q.15. What are the other names of cost of living index numbers?Ans. Consumer price index number or retail price index number.

Q.16. What is whole – sale price index?Ans. An index number considering the price quotations of whole-sale markets is called as whole-sale price index.

Q.17. What is un-weighted index number?Ans. An index number that measures the change in the price (or quantity) of a group of commodities when the relative

importance of commodities is not taken into account is called un-weighted index number.

Q.18. What is weighted index number?Ans. An index number that measures the change in the prices (or quantities) of a group of commodities when the relative

importance of commodities has been taken into account is called weighted index number.

Q.19. Name the ideal index number?Ans. Fisher’s index number is called ideal index number.

Q.20. What is base year weighted index number?Ans. Laspeyre’s index number is called base-year weighted index number.

Q.21. What is the other name of Paasche’s index?Ans. Paasche’s index number is also called current year weighted index number.

Q.22. Give two uses and two limitations of index number.Ans. Uses of index Numbers:

(i) Index numbers are of great helpful in forecasting business conditions.(ii) Index numbers are useful in education for I.Q. comparison and effectiveness of teaching systems.

Limitations of Index Numbers:(i) All index numbers are not suitable for all purposes.(ii) Different methods of construction yield different results.

Important Points and Formulae Price Relatives P.R =

Link Relatives L.R =

Simple Aggregative Index Pon =

Laspeyre’s (Base year weighted) Index

Pon =

Paasche’s (Current year weighted) Index

Pon =

Fisher’s Ideal Index Pon =

Consumer Price Index /Cost of Living Index

(i) Aggregative Expenditure Method:

Pon =

(ii) Weighted Average of Relatives:

Pon = , I =

Page # 31

Q.1. For the following data construct index number by(i) fixed base and(ii) chain base method taking 1960 as base:

Year: 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970Price: 40 47 52 50 54 30 39 45 50 60 55Solution:

Year Price

(i) Fixed Base (ii) Chain Base

P.R = L.R = Chain indices

19601961196219631964196519661967196819691970

4047525054303945506055

100117.513012513575

97.5112.5125150

137.5

100117.5

110.6496.15108

55.55130

115.38111.11

12091.67

100117.513012513575

97.5112.5125150

137.5

Q.2 Construct chain indices for the prices of sugar (per kg) for the year 1962 1970.

Year Price (Rs) Year Price (Rs)

19621963196419651966

0.801.001.201.251.25

1967196819691970

1.421.501.621.75

Solution:

Year Price (Rs) Chain Indices

196219631964196519661967196819691970

0.801.001.201.251.251.421.501.621.75

125120

104.17100

113.60105.63

108108.02

125150

156.26156.26177.51187.50202.5

218.74

Page # 32

Q.3 Construct with the help of the following data:(i) Laspeyre’s (ii) Paasche’s Index

ItemBase Year Current Year

Price Quantity Price Quantity

ABC

322

7110762

332

268870

Solution:

ItemBase year Current year

p0qo p1qo p1q1 poq1po qo p1 q1

A

B

C

3

2

2

71

107

62

3

3

2

26

88

70

213

214

124

213

321

124

78

264

140

78

176

140

551 658 482 394

(i) Laspeyre’s Index:

P01 =

=

(ii) Paasche’s Index:

P01 =

= = 122.34

Q.4. Construct index number for the year 1992 on the basis of the year 1987 of the following by using:(i) Laspeyre’s (ii) Paasche’s(iii) Fisher’s Ideal Formula

YearA B C

Price Quantity Price Quantity Price Quantity

19871992

54

1012

87

2627

65

1314

Solution:

Item1987 1992

p1qo poqo p1q1 poq1po qo p1 q1

ABC

586

102613

475

122714

4018265

5020878

4818970

6021684

287 336 307 360

(i) Laspeyre’s Index:

P01 =

= = 85.42

(ii) Paasche’s Index:

Page # 33

P01 =

= = 85.28

(iii) Fisher’s Ideal Index:

P01 =

=

= 0.85347 100 = 85.35

Q.5. Find index number(i) taking the year 1980 as base(ii) taking the average of 1st three years as base(iii) taking the average of all the years as base

Year Price in Rs

198019811982198319841985198619871988

22.524.028.530.035.032.537.546.548.5

Solution:

Year Prices

(i)P.R =

(ii)P.R =

(iii)P.R =

198019811982198319841985198619871988

22.524.028.530.035.032.537.546.548.5

100106.67126.67133.33155.56144.44166.67206.67215.56

9096

114120140130150186194

66.3970.8284.1088.52

103.2895.90

110.65137.21143.11

305

Average of first three years =

Average of all the years =

Q.6. Find chain indices from the following price relatives

Year Price Relatives

Page # 34

A B C

197019711972197319741975

100103112115120125

1009590

10096

102

100110115120125128

Solution:

YearLink Relatives = L.R =

Mean Chain Indices

A B C197019711972197319741975

100103

108.74102.68104.35104.17

10095

94.74111.11

96106.25

100110

104.55104.35104.17102.4

100102.67102.68106.05101.51104.27

100102.67105.42111.80113.49118.34

Documents

Stat 2