Upload
yasir-khan
View
194
Download
12
Tags:
Embed Size (px)
Citation preview
Page # 1
Presentation of DataQ.1. What are different methods of presentation of Data?Ans. (i) Classification
(ii) Tabulation(iii) Diagrams(iv) Graphs.
Q.2. What is Classification?Ans. Classification is the process of arranging the data into relatively homogeneous groups or classes according to their
resemblances and affinities.
Q.3. What is Tabulation?Ans. The systematic arrangement of data in the form of rows and columns for the purpose of comparison and analysis is
known as tabulation.
Q.4. What is an array?Ans. Arrangement of data in ascending or descending order is called as an array.
Q.5. What are the main parts of the table.Ans. (i) Title
(ii) Box-head(iii) Stub(iv) Body(v) Prefatory Note(vi) Foot note(vii) Source Note.
Q.6. What is frequency distribution?Ans. A frequency distribution is a tabular arrangement of the data that shows the distribution of observations among different
classes.
Q.7. What are class limits?Ans. The class limits are defined as the values of the variables, which explain the classes.
Q.8. What do you mean by open-end class?Ans. If a frequency distribution has no lower class limit or no upper class limit of its any class is called an open-end class.
Q.9. What are class boundaries?Ans. The class boundaries are the exact values, which break up one class from another class.
Q.10. What do you mean by class marks (or mid points)?Ans. A class mark is average value of the lower and upper class limits or class boundaries.
Q.11. What is class interval?Ans. The difference between the upper and lower class boundaries is called class interval or class width.
Q.12. What is class frequency?Ans. The number of values falling in a specified class is called class frequency or frequency.
Q.13. What is relative frequency?Ans. The frequency of a class divided by the total frequency is called relative frequency.
Q.14. What is Histogram?Ans. A histogram is a set of adjacent rectangles for a frequency distribution such that the area of each rectangle is
proportional to the corresponding class frequency.
Q.15. What is frequency polygon.Ans. A frequency polygon is a many-sided closed figure that represents a frequency distribution.
Q.16. How is a frequency polygon constructed?Ans. It is constructed by plotting the mid points and corresponding frequencies and then connecting them by straight line
segments.
Q.17. What is ogive?Ans. The cumulative frequency polygon is called ogive.
Page # 2
Q.18. What is chart?Ans. A chart is a device used for representing a simple statistical data in a simple, clear and effective manner.
Q.19. What is ungrouped data?Ans. The fresh data that have been collected for the first time are called ungrouped data.
Q.20. What do you mean by grouped data?Ans. When the ungrouped data are arranged according to classes or groups with their respective frequencies are called
grouped data.
Q.21. For which distribution the graph of the frequency distribution is bell-shaped?Ans. For symmetrical distribution the graph is bell-shaped.
Q.22. Name some graphs of frequency distribution.Ans. Histogram, polygon, frequency curve, ogive.
Q.23. Name some charts / diagrams.Ans. Simple bar diagram, sub-divided bar diagram, Multiple bar diagram, Pie chart etc.
Q.24. What is the mid point of class 20-24?Ans. Mid point is 22.
Q.25. Write the formula of angle of sector used in pie chart.Ans. Angle of sector =
************************
Page # 3
Example 1: The following data shows the number of children in different families of a small locality:1, 2, 4, 3, 0, 1, 2, 3, 1, 1, 0, 2, 1, 0, 2, 3, 0, 0, 1, 3.
Make a frequency distribution. Also find relative frequencies.Solution:
Range = Maximum value – Minimum value= 4 – 0 = 4
The number of children Tally The number of families (f) r.f =
01234
//////// //////////
56441
5/20 = 0.256/20 = 0.304/20 = 0.204/20 = 0.201/20 = 0.05
---- 20 1.00
Example 2: The following data shows the ages of 50 cancer patients admitted in Shaukat Khanum Memorial Cancer Hospital, Lahore:
48 29 39 32 54 33 44 36 38 31
46 30 20 44 47 39 42 35 33 47
31 35 34 42 41 42 43 35 32 35
43 36 37 45 46 41 25 27 26 40
38 41 44 47 45 45 52 43 44 43
Make a frequency distribution. Also find class boundaries and mid points.Solution:
The following steps are involved in constructing a frequency distribution.i) Range = Maximum value – Minimum value
= 54 – 20 = 34ii) Approximate number of classes
No. of classes = 1 + 3.322 logn = 1 + 3.322log(50)= 1 + 3.322 (1.6990)= 6.6066 7 (approximately)
iii) Width of class interval
h = (appr.)
iv) Group the entire data with an interval of 5 each and write down the classes in the first column under the heading “Ages”. Count the actual number falling in each interval putting a tally (/) in the proper interval for each value. Count the number of tallies for each interval and write down in the next column, these are frequencies denoted by f.
Ages(Class limits)
Tally f Class boundaries Mid point (X)
20 – 2425 – 2930 – 3435 – 3940 – 4445 – 4950 – 54
///////// /////// //// ///// //// //// //// //////
148111592
19.5 – 24.524.5 – 29.529.5 – 34.534.5 – 39.539.5 – 44.544.5 – 49.549.5 – 54.5
22273237424752
---- 50 ---- ----
Example 3: The following data shows the scores made by Pakistani cricketers against New Zeeland in one-day
match. Draw a simple bar chart of the following data:
Cricketers Inzmam Waseem Shahid Saeed Imran Razzaq
Scores 54 47 26 30 25 23
Page # 4
Solution:
Example 4: Following data about the production of wheat in different localities of the Punjab for years1987 to 1989.
Production in Kg (thousands)
Year 1987 1988 1989
Locality ILocality IILocality III
500600200
600700400
800700500
i) Make a multiple bar chartii) Make a component bar chartiii) Make a percentage bar chart.
Solution: i)
Production in Kg (thousands)
Year 1987 1988 1989
Locality ILocality IILocality III
500600200
600700400
800700500
ii)Production in Kg (thousands)
Year 1987 1988 1989Locality I 500 600 800
Page # 5
Locality IILocality III
600200
700400
700500
Total 1300 1700 2000
iii)
Percentage Production in Kg (thousands)
Year 1987 1988 1989
Locality I
Locality II
Locality III
Total 100 100 100
Example5: The data are available regarding total production of urea fertilizer and its use on different crops. Total production of urea is 200 (thousand Kg) and its consumption for different crops wheat, sugarcane, maize, and lentils is 75, 80, 30 and 15 (thousand Kg) respectively. Make an appropriate diagram to represent these data.Solution:
Crops Fertilizer (thousand Kg)Angle of sector
Wheat 75
Page # 6
Sugarcane 80
Maize 30
Lentils 15
Total 200 360
Example: 6: Make a histogram from the following data:Marks f86 – 9091 – 95
96 – 100101 – 105106 – 110111 – 115
6410631
Solution: Marks f Class boundaries86 – 9091 – 95
96 – 100101 – 105106 – 110111 – 115
6410631
85.5 – 90.590.5 – 95.5
95.5 – 100.5100.5 – 105.5105.5 – 110.5110.5 – 115.5
30 ----
Page # 7
Measures of Central Tendency
Q.1. Define Average. What are its important types?Ans. Average
An average is a single value, which represents the data. Averages are also called “measures of central tendency” or “measures of location.” Types of Averages
The important types of averages are given below:(i) Arithmetic Mean or Mean(ii) Median(iii) Mode.
Q.2. Write down the properties of good average.Ans. The properties of a good average are given below:
(i) It should be clearly defined.(ii) It should be easy to calculate.(iii) It should be simple to understand.(iv) It should be based on all the observations.(v) It should not be affected by extreme values.(vi) It should be capable of mathematical treatment.
Q.3. Define Arithmetic Mean. Ans. Arithmetic Mean (A.M.)
Arithmetic mean or simply mean is defined as the sum of all the values in a data divided by number of values.Let X1, X2, X3,....…Xn be n observations, then arithmetic mean denoted by ( ) is given:
= =
Q.4. Write down the properties of Arithmetic Mean.Ans. The important properties of arithmetic mean are given below:(i) The sum of deviations of observations from their mean is zero. i.e.,
(X ) = 0 for ungrouped dataf(X ) = 0 for grouped data
(ii) The mean of a constant is constant itself i.e., If X = a then = a(iii) The arithmetic mean is affected by change of origin and scale. It means that if we add or subtract a constant from all the
values or multiply or divide all the values by a constant, the mean is affected by the respective change. i.e.,If Y = X a, = aIf Y = a bX, = a b
If Y = , = where a 0
(iv) The sum of squares of the deviations of the observations from their mean is minimum i.e., (X - )2 is minimum.(v) If n1, values have mean , n2 values have mean and so on nk values have mean , then the mean of all values is
called as combined mean. It is denoted by =
X or is given below: = =
Q.5. Define weighted arithmetic mean. In what circumstances is it preferred to ordinary mean and why?Ans. Weighted Arithmetic Mean
Sometimes all observations are not of equal importance. To show the importance of every observations we assign to it a value called weight. If n observation X1, X2, ----- Xn have the respective weights W1, W2 -------, Wn, then weighted arithmetic mean denoted by is obtained as
= = When all the values in the data are not of equal importance, it is preferred to ordinary mean because it gives relative
importance to all the values.Q.6. Define Median.
Page # 8
Ans. Median:Median is defined as the central value of the arranged data. It is a positional average denoted by ~
X
~X = th value for ungrouped data and grouped data (discrete)
~X = l + for grouped data (continuous)
Median class or Median group =
l = lower class boundary of the median class.h = class interval of the median class.
f = frequency of the median class.C = cumulative frequency of the class preceding to the median class.
Q.7. Define Mode. Write down its methods of calculation.
Ans. Mode Mode is defined as the most frequent value of the data. It is denoted by .
Methods of Calculation of Mode(i) Mode (for ungrouped data):
In ungrouped data mode is found by inspection. For example, the mode of 2,8,7,3,9,3 is 3.(ii) Mode for frequency distribution (Discrete):
The value corresponding to the maximum frequency.(iii) Mode for frequency distribution (Continuous):
= l + h
fm = frequency of the modal class.l = lower class boundary of the modal class.
f1 = frequency preceding the modal class.f2 = frequency following the modal class.h = class interval or width of the model class.
Q.8. Define harmonic mean?Ans. Harmonic mean is defined as the reciprocal of the mean of the reciprocals of the observations
Q.9. Define G.M.Ans. The geometric mean is defined as the nth root of the product of n positive values. Q.10 (a) What do you mean by unimodal, bimodal, multimodal distributions?
(b) When it is not possible to find mode?Ans. (a) Unimodal Distribution: A distribution having a single mode is called unimodal distribution.Bimodal Distribution:A distribution having two modes is called bimodal distribution.Multimodal Distribution:A distribution having more than two modes is called multimodal distribution. (b) If each value occurs the same number of times, then it is not possible to find mode.
TOPIC UNGROUPED GROUPED DATA
Arithmetic Mean
Direct = =
Deviation = A + = A +
Step deviation / Coding
= A + u =
= A + h
Geometric MeanG.M= = antilog
G.M = = antilog
Page # 9
Harmonic MeanH.M = H.M =
Median~
Y = The value of ~
Y = l +
QuartilesQK=The Value of Kth itemk = 1,2,3
QK = l +
Mode Observation Method/Inspection Method
^Y = l +
Note: The formulae of median, quartiles, deciles and percentiles for discrete frequency distribution are same as that of ungrouped data.
Weighted Arithmetic Mean:_
Yw = For symmetrical distribution:
Mean = Median = ModeFor skewed Distribution:
Mode = 3 Median – 2 MeanExample 1: Find arithmetic mean of the following data:
102, 104, 106, 108, 110.(i) By direct method (ii) By short-cut method
Solution:
X D= X – A(X –100)102 2104 4106 6108 8110 10
X= 530 D= 30Arithmetic mean:(i) Direct Method (ii) Short-cut Method
Example 2: Find average age from the following frequency distribution of ages of 50 patientsAges
No. of patients20-24 125-29 430-34 835-39 1140-44 1545-49 950-54 2
Solution:Ages f X fX
20-24 1 22 2225-29 4 27 10830-34 8 32 256
Page # 10
35-39 11 37 40740-44 15 42 63045-49 9 47 42350-54 2 52 104 50 ----- 1950
Average Age:
=
= = 39Hence the average age of patients is 39 years.
Example 3:Find the arithmetic mean from the given information:(i) D = X– 39, ΣD = 240 and n = 10
(ii) u = , Σu = 23 and n = 20(iii) X = 10 + 5u, Σfu = - 46 and n = 125
Solution: (i) ΣD = 240, n = 10
D = X – 39, Comparing with D = X – A, A = 39
Arithmetic mean = = A + = 39 + = 39 + 24 = 63
(ii) Σu = 23 n = 20
u = , Comparing with u =
A = 57, h = 5
Arithmetic mean = = A +
= 57 + 5 = 57 + 5.75 = 62.75
(iii) X = 10 + 5u, Σfu = – 46, n = 125X – 10 = 5u
u = , Comparing with u =
A = 10, h = 5, n = f = 125
Arithmetic mean = = A +
= 10 + 10 + (– 1.84)= 10 – 1.84 = 8.16
Example 4: Calculate the weighted man of the following data:
Items Expenditure Weight
Food 290 7.5
Rent 54 2.0
Clothing 98 1.5
Fuel 75 1.0
Miscellaneous 75 0.5
Solution:
Items X W WX
Food 290 7.5 2175
Rent 54 2.0 108
Page # 11
Clothing 98 1.5 147
Fuel 75 1.0 75
Miscellaneous 75 0.5 37.5
----- 12.5 2542.5
Weighted mean:
=
= = 203.4
Example 5: The ungrouped data is given below:.
45, 30, 35, 40, 44, 32, 42, 37
Calculate geometric mean using:
i) Basic definition ii) log formula
Solution:
i) G.M using basic definition
G.M =
= = 37.76
ii) Using log formulaY Log Y4530354044324237
1.65321.47711.54411.60211.64341.50511.62321.5682
12.6164
G.M. = antilog == antilog
= antilog 1.57705 = 37.76Example 6:Following data has obtained from a frequency distribution using
u = , Show that G.M is less than A.M.
u – 4 –3 –2 –1 0 1 2 3f 2 5 8 18 22 13 8 4
Solution: u ƒ ƒu Y = 2u + 136.5 logY ƒ log Y-4 2 -8 128.5 2.1089 4.2178-3 5 -15 130.5 2.1156 10.578-2 8 -16 132.5 2.1222 16.9776-1 18 -18 134.5 2.1287 38.31660 22 0 136.5 2.1351 46.97221 13 13 138.5 2.1414 27.83822 8 16 140.5 2.1477 17.18163 4 12 142.5 2.1538 8.6152
Page # 12
80 -16 ----- ----- 170.6972
u =
2u = Y – 136.5Y = 2u + 136.5, A = 136.5, h = 2
Arithmetic Mean
= A + h= 136.5 + = 136.5 + (–0.4)
= 136.5 – 0.4 =136.1
G.M = Antilog =Antilog =Antilog (2.1337) = 136.05
A.M = 136.1, G.M = 136.05It shows that G.M is less than A.M i.e.G.M < A.M
Example 7: Find harmonic mean for the following grouped data
Class boundaries0 – 4 4 – 8 8 – 12
12 – 16 16 – 20 20 – 24 24 – 28
2578741
Solution:
C.B f Y
0 – 4 4 – 8 8 – 12
12 – 16 16 – 20 20 – 24 24 – 28
2578741
261014182226
1.00000.83330.70000.57140.38890.18160.0385
34 ---- 3.7137
H.M =
= = 9.1553Example 8: Find median from the following data:
(i) c, a, b(ii) 88.03, 94.50, 95.05, 84.60(iii) 87,91,89,88,89,91,87,92,90,98.
Solution: (i) The data in an array:
a, b, c
Median = th value
= th value = 2rd value = b
(ii) The data in an array:Sr. No. 1 2 3 4 5Values 84.60 88.30 94.50 94.90 95.05
Page # 13
Here n = 5
Median = th value
= th value = 3rd value = 94.50
(iii) The data in an array:Sr. No. 1 2 3 4 5 6 7 8 9 10Values 87 87 88 89 89 90 91 91 92 98
Here n = 10,
Median = th value
= th value.
= (5.5)th value= =
Example 9: Find the median from the following data of heights of students:
C.1Frequency
86 – 9091 – 95
96 – 100101 – 105106 – 110111 – 115
6410631
Solution:C.I Class boundaries f c.f
86 – 9091 – 9596 – 100
101 – 105106 – 110111 – 115
85.5 – 90.590.5 – 95.5
95.5 – 100.5100.5 – 105.5105.5 – 110.5110.5 – 115.5
64
10631
61020262930
---- 30 ----
Median =
Median = 95.5 + (15-10) = 95.5 + 2.5 = 98.0
Example 10: Find mode for the following data:91, 89, 88, 87, 89, 91, 87, 92, 90, 98, 95, 97, 96, 100, 101, 96, 98, 99, 98, 100, 102, 99, 101, 105, 103, 107, 105, 106, 107, 112.Solution: Since the most frequent value of the data is 98.Therefore, Mode = 98 Example 11: Find mode for the following frequency distribution of heights of students:
HeightsFrequency
86 9091 95
96 100101 105106 110111 115
6410631
Page # 14
Solution:
Heights C.1Class boundaries f
86 9091 95
96 100101 105106 110111 115
86 – 9091 – 9596 – 100
101 – 105106 – 110111 – 115
85.5 – 90.590.5 – 95.5
95.5 – 100.5100.5 – 105.5105.5 – 110.5110.5 – 115.5
64
10631
Mode =
Sine maximum frequency is 10, therefore 95.5 – 100.5 is modal class.
Mode =
= 95.5+3.0 = 98.5
Measures of Dispersion
Q.1. What do you mean by dispersion?Ans. Dispersion means the variability of values about the measures of central tendency.
Q.2. What are important types of dispersion?Ans. (i) Range (ii) Quartile Deviation
(iii) Mean Deviation (iv) Standard Deviation(v) Variance.
Q.3. Differentiate between absolute dispersion and relative dispersion.Ans. An absolute dispersion is that type of dispersion in which measures of dispersion have the same units as those of original data. A
relative dispersion is that type of measures of dispersion, which is independent of unit of measurements.
Q.4. Define range.Ans. Range is defined as the difference between maximum and minimum values of the data.
Q.5. Define quartile deviation.Ans. It is defined as “Half of the difference between upper and lower quartiles”.
Q.6. When does range become zero?Ans. For constant observations, range is zero.
Q.7. Define mean deviation.Ans. It is defined as
“The arithmetic mean of the absolute values of the deviations from any average.
Q.8. Define variance.Ans. It is defined as “The arithmetic mean of squares of deviations of values from their mean.
Q.9. What will be variance of 3, 3, 3, 3, 3, 3, ?Ans. Zero.
Q.10. Define standard deviation.Ans. It is defined as :
“The positive square root of the arithmetic mean of squares of deviations from their mean.”
Q.11. If s.d. = 3, then what will be the variance?Ans. Variance will be 9.
Q.12. If S.D. of 2, 4, 6, 8, 10 is 2.83, then what will be S.D. of 102, 104, 106, 108, 110?Ans. The S.D. of 102, 104, 106, 108, 110 will be 2.83.
Page # 15
Q.13. Is variance affected by change of scale.Ans. Yes, variance is affected by the change of scale.
Q.14. Is variance of negative values negative?Ans. No, variance is always non-negative.
Q.15. What is the utility of standard deviation.Ans. It has great practical utility in sampling and statistical inference.
Q.16. What is the relationship between mean, median and mode for positively skewed and negatively skewed distribution.Ans. For positively skewed distribution
Mean > Median > ModeFor negatively skewed distribution Mean < Median < Mode
Topic Absolute DispersionRelative Dispersion
RangeR = Ym Yo Co-efficient of
Range=
\Quartile Deviation or Semi Inter Quartile Range
Q.D = Co-efficient of Q.D =
Mean Deviation
Ungrouped Grouped
M.D_Y =
M.D~Y =
M.D _
Y =
M.D ~
Y =
Standard Deviation
Co-efficient of S.D =
Variance S2 = (S.D)2 S2 = (S.D)2
Co efficient of Variation = 100
Co-efficient Measures of Skewness
Karl Pearson’s
S.K = S.K =
Bowley’s or QuartileS.K =
Symmetrical Distribution:S.K = 0Mean = Median = Mode
Positively Skewed Distribution:S.K > 0
Page # 16
Mean > Mode > Median
Negatively Skewed Distribution:S.K < 0Mean < Mode < Median
Example 1: Find range for each of the following data.i) 12, 6, 7, - 3, 15, 10, 18, 5& – 24 ii) 19, 3, 8, 9, 7, 8, 10, 12, 18 & 21iii) 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4
Solution:i) 12, 6, 7, - 3, 15, 10, 18, 5& – 24
Range = Ym – Y0
Maximum value = Ym = 18Minimum value = Y0 = – 24
Range = 18 – (–24) = 18 + 24 = 42ii) 19, 3, 8, 9, 7, 8, 10, 12, 18 & 21
Range = Ym – Y0
Maximum value = Ym = 21Minimum value = Y0 =3
Range = 21 – 3 = 18iii) 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4
Range = Ym – Y0
Maximum value = Ym = 4Minimum value = Y0 =4
Range = 4 – 4 = 0The range of constant is zero.
Example 2: Find range for the following frequency distribution.Groups ƒ
70-74 275-79 580-84 1285-89 1890-94 7
Also find co- efficient of dispersion.Solution:
Range = Ym – Y0
Ym = Upper class boundary of the highest class = 95.5Y0 = Lower class boundary of the lowest = 69.5
Range = 94.5 – 69.5 = 25.0
Co-efficient of dispersion (Range) = = =
Example3: Find lower quartile, upper quartile & quartile deviation from the given data:Groups Frequency70-74 275-79 580-84 1285-89 1890-95 7
Solution:Groups Frequency C.B C.f70-74 2 69.5-74.5 275-79 5 74.5-79.5 7
Groups ƒ C.B70-74 2 69.5 – 74.575-79 5 74.5 – 79.580-84 12 79.5 – 84.585-89 18 84.5 – 89.590-95 7 89.5 – 94.5
Page # 17
80-84 12 79.5-84.5 1985-89 18 84.5-89.5 3790-95 7 89.5-94.5 44 44 ----- -----
Example 4: The ungrouped date is given below:2, 5, 6, 6, 8, 9, 12, 13, 16, 23
Calculate the average deviation from i) Mean ii) Median.
Solution:
Y= =
2 8 6.55 5 3.56 4 2.56 4 2.58 2 .59 1 .512 2 3.513 3 4.516 6 7.523 13 14.5
Y = 100 = 48 = 46
i) Average deviation (Mean) =
Mean =
Average deviation (Mean) = =
ii) Average deviation (Median) =
Page # 18
Average deviation (Median) = =
Example 5: Calculate median & mean deviation from the following data:Solution:
f C.B C.f2 9.25 – 9.75 2 1.57 3.145 9.75 – 10.25 7 1.07 5.35
12 10.25 – 10.75 19 0.57 6.8417 10.75 – 11.25 36 0.07 1.1914 11.25 – 11.75 50 0.43 6.026 11.75 – 12.25 56 0.93 5.583 12.25 – 12.75 59 1.43 4.291 12.75 – 13.25 60 1.93 1.93
60 ----- ----- ----- 34.34
M.D from Median:
Example 6: Calculate variance & standard deviation from the following data:102, 104, 106, 108, 110
Solution:
Y
102104106108110
-4-2024
1640416
530 0 40
,
= = 2.83
Example 7: Determine, mean S.D and C.V from the given data:
Page # 19
Ages Frequency
20-24 125-29 430-34 835-39 1140-44 1545-49 950-54 2
Solution: Ages f
YfY fY2
20-24 1 22 22 48425-29 4 27 108 291630-34 8 32 256 819235-39 11 37 407 1505940-44 15 42 630 2646045-49 9 47 423 1988150-54 2 52 104 5408 50 1950 78400
REGRESSION AND CORRELATIONScatter Diagram: The graphic representation of a set of “n” pairs of bivariate data is called scatter diagram or scatter plot.
In scatter diagram we take independent variable along the horizontal axis (xaxis) and the dependent variable along vertical axis (yaxis), the resulting set of points drawn on the graph paper. If a relationship between the Variables exists, then the points in the scatter diagram will show a tendency to cluster around a straight line or some curve. Such a line or curve around which the points cluster is called the regression line or regression curve which can be used to estimate the expected value of the random variable Y from the values of the nonrandom variable X. The scatter diagrams shown below show the relationship between two variables.
Page # 20
Regression: The dependence of one variable (dependent variable) on one or more other variables (independent variables) is called regression. When we study the dependence of a variable on a single independent variable, it is called simple regression or twovariable regression. When the dependence of a variable on two or more than two variables is studied, it is called multiple regression.Regressand: In regression process the dependent variable is called regressand. It is also called as the response variable or the predictand variable or the dependent variable or the explained variable.Regressor: In regression process the independent variable is called as the regressor. It is also called as the predictor variable or the independent variable or the controlled variable or the explanatory variable.Least Squares Principle: The principle of least squares states that the sum of squares of the residuals of observed values from their corresponding estimated values should be least.Properties of the Least Squares Line: Following are the important properties of the least squares regression line:(i) The sum of residuals between the observed the corresponding estimated values is always zero i.e.,
e = (y – ) = 0(ii) The sum of squares of the residuals e2 is minimum.(iii) The least squares regression line always passes through the point .(iv) It is the best line because a and b are the unbiased estimates of the parameters and .Correlation: The degree or strength of relationship (interdependence) between the variables is called correlation.
Examples of correlation; heights and weights of children, ages of husbands and ages of wives at the time of their marriages, marks of students in mathematics and in statistics etc.Product Moment Coefficient of Correlation: A numerical measure of strength in the linear relationship between any two variables is called the Pearson’s product moment correlation coefficient or coefficient of simple correlation.
The sample linear correlation for n pairs of observations is defined by
(i) Positive Correlation: If both the variables are moving in same direction (increase or decrease), then it is said to be positive or direct correlation. For example, ages and heights of children.(ii) Negative Correlation: If both the variables are moving in opposite direction it is called negative or inverse correlation. For example, increase in the supply of a commodity decreases its price.(iii) No Correlation: If the change in one variable does not effect the other variable, then there will be no correlation. For example, the head sizes and I.Q’s of persons.Properties of Coefficient of Correlation: The important properties of coefficient of correlation are given as follows:(i) The coefficient of correlation is symmetrical with respect to x and y, i.e.,
rxy = ryx
(ii) The correlation coefficient is a pure number i.e., it does not depend upon the unit of measurement.(iii) The correlation coefficient always lies between –1 and +1.(iv) The correlation coefficient is the geometric mean between the two regression coefficients i.e.,
r = +ve, if both byx and bxy are +ve.r = ve, if both byx and bxy are ve.
(v) The correlation coefficient is independent of origin and scale, i.e. ,rxy = ruv
Important Points & Formuale
Page # 21
Regression line of y on x is= a + bx or
= a + byxx (b = byx)
Regression line of y on x is= c + dy or
= c + bxyy (d = bxy )
byx =
= =
bxy =
= =
a = or a = c = or c =
Coefficient of Correlation
r =
=
=
Example 1 The following table shows the ages x and systolic blood pressures y of 12 women.Age (years) xi
56 42 72 36 63 47 55 49 38 42 68 60
Blood pressure yi
147 125 160 118 149 128 150 145 115 140 152 155
Fit a regression line of blood pressure on age. Estimate the expected blood pressure of a women whose age is 45 years. What is the change in blood pressure for a unit change in age.
Solution:
x y xy x2
564272366347554938426860
147125160118149128150145115140152155
82325250
115204248938760168250710543705880
103369300
313617645184129639692209302524011444176446243600
628 1684 89894 34416The estimated line of y on x is
= a + bxb =
=
Page # 22
a = Hence = 80.778 + 1.138 x
For x = 45; = 0.80.778 + 1.138(45) = 131.988 132Example2 The following table gives the number of persons employed and cloth manufactured in a textile mill.
Persons employed xi 137 209 113 189 176 200 219Cloth manufactured yi 23 47 22 40 39 51 49
Calculate the coefficient by using the above formula.Solution:
x y xy x2 y2
137209113189176200219
23472240395149
31519823248675606864
1020010731
18769436811276935721309764000047961
5292209484
1600152126012401
1243 271 50815 229877 11345The correlation co-efficient is
r =
=
=
Example 3 A random sample of 20 pairs of observations (xi, yi) gave the following:
Estimate the linear regression equation taking (i) X as independent variable (ii) Y as independent variable.Solution:
(i) Regression function taking x as independent is^y = a + bx
b =
=
a = = 8 – 0.84(2) = 6.32
Hence ^y = 6.32 + 0.84 x(ii) Regression function taking y as independent variable is
^x = c + dy
d =
= =
c = = 2 – 0.583(8) = 2.67
Hence ^x = 2.67 + 0.583 y
Page # 23
ANALYSIS OF TIME SERIES
Time Series: A time series consists of numerical data collected, observed or recorded at successive time periods.Examples of time series are; the hourly temperature recorded by weather bureau, the total monthly sales of pens in a
book shop, the annual rainfall at Murree etc.Analysis of Time Series: Analysis of time series is decomposition of a time series into its different components for separate study. The basic purpose of analysis of time series is to use it for forecasting.Signal: The systematic component of variation in time series is called signal.Noise: An irregular or random component of variation in time series is called noise.Historigram: The graph of a time series is called historigram. It is constructed by taking time along xaxis and the time series along yaxis. Using an appropriate scale, points are plotted, then these points are joined by line segments to get required historigram.Components of Time Series: Following are the main components of time series:(i) Secular trend (T)(ii) Seasonal variations (S)(iii) Cyclical movements (C)(iv) Irregular movements (I)(i) Secular Trend: A secular trend is a long term movement that indicates the general direction of the variation in a time series. It represents smooth, steady and gradual movement in a time series in the same direction.
Examples of secular trend are; a decline in death rate due to advances in science, a continually increasing demand for smaller automobiles etc.(ii) Seasonal Variations: The Seasonal variations are short term movements that indicate the identical changes in a time series during the corresponding seasons. The main causes of these variations are seasons, religious affairs and social customs. Examples of seasonal variations are; the increased sales of cotton cloths in summer, an after Eid sale in a departmental store, an increase in employment during summer etc.(iii) Cyclical Movements: Cyclical movements refer to the long term oscillations or swings about the trend line or curve since the movements take the form of upward and downward swings, they are also called “cycles”. The four phases of a business cycle are prosperity, recession, depression and revival, provide important example of cyclical movements.(iv) Irregular Movements: Irregular movements are unsystematic in nature. They occur in a completely unpredictable manner by chance, events such as war, floods, earthquakes, strikes, fires etc. These variations are also called accidental, residual or random variations. Examples of irregular movements are; a fire in a factory delaying in production for 3 weeks, rise in prices due to floods etc.Methods of measuring secular trend in a time series?(i) Free hand curve method(ii) Method of semi averages(iii) Method of moving averages(iv) Method of least squares
Important Points & Formulae
Page # 24
Coding of xOrigin at beginning
(x)Origin at Middle
Odd numbers Even numbersHalf unit One unit
0123....................................
.....3210123............
.....75311357......
.....3.52.51.50.50.51.52.53.5.....
* The equation of semi averages is= a + bx
where b = and a =
* The equation of linear trend is = a + bxNormal Equations are:
y = na + bxxy = ax + bx2
If x = 0 a =
b =
Examples
Example 1. Make a historigram from the following data:
Year 1962 1963 1964 1965 1966 1967Production (tons) 20 28 50 15 18 27
Solution:
Example 2. The following table shows the property damaged by road accidents in Punjab for the years 197379:
Year 1973 1974 1975 1976 1977 1978 1979
Property damaged 201 238 392 507 484 649 742
Page # 25
Find trend values by free hand curve method.Solution:
Year Property damaged Trend value (from graph)1973197419751976197719781979
201238392507484649742
187278369460551642733
Example 3. From the data given below:Year 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969Value 318 326 337 340 359 365 372 381 402 410
Obtain trend values using method of semi averages.Solution:
Year y Semi x = t – 1960
Trend value = 316 + 10xtotal average
1960
1961
1962
1963
1964
318
326
337
340
359
1680 = 336 =
0
1
2
3
4
= = 2
316
326
336
346
356
1965
1966
1967
1968
1969
365
372
381
402
410
1930 = 386 =
5
6
7
8
9
= = 7
366
376
386
396
406
The estimated equation of semi averages is = a + bx
b =
a =
Page # 26
= 336 – 10(2) = 336 – 20 = 316Hence = 316 + 10x Example 4. Use the method of semi average to find trend values for the following data showing net profit (in lacs of rupees) of SNGPL for the years 196472.
Year 1964 1965 1966 1967 1968 1969 1970 1971 1972Profit 33 86 116 95 101 128 146 110 32
Find the estimated profit in 1964.Solution:
Year y Semi x = 76.05 + 4.3xtotal average
1964
1965
1966
1967
33
86
116
95
330 = 82.5 =
0
1
2
3
= = 1.5
76.05
80.35
84.65
88.95
1968 101 4 93.251969
1970
1971
1972
128
146
110
32
416 = 104 =
5
6
7
8
= = 6.5
97.55
101.85
106.15
110.45
The estimated equation of semi averages is = a + bx
b =
a = = 82.5 – 4.3(1.5) = 82.5 – 6.45 = 76.05
Hence = 76.05 + 4.3xEstimated profit for the year 1994:
x = 1994 – 1964 = 30For x = 30; = 76.05 + 4.3 (30) = 205.05Example 5. Find
(i) 3year(ii) 5year moving averages for the following time series.
Year Value Year Value194819491950195119521953
202326292329
195419551956195719581959
312738343335
Solution:
3year moving 5year moving
Page # 27
Year Valuetotal average total average
194819491950195119521953195419551956195719581959
202326292329312738343335
6978788183879699
105102
23262627
26.672932333534
121130138139148159163167
24.226.027.227.829.631.832.633.4
Example 6. Find out 4year moving average (centred) for the given data.Year Production
(tons)Year Production
(tons)19481949195019511952
50.036.543.044.538.9
19531954195519561957
38.132.638.741.741.1
Solution:
Year Production 4year movingtotal average average (centred)
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
50
36.5
43.0
44.5
38.9
38.1
32.6
38.7
41.7
41.1
174.0
162.9
164.5
154.1
148.3
151.1
154.1
43.50
40.73
41.13
38.53
37.08
37.78
38.53
= 42.12
= 40.93
39.83
37.81
37.43
38.16
Page # 28
Example 7. The following data shows the production of steel in a mill for the years 19561964.
Year 1956 1957 1958 1959 1960 1961 1962 1963 1964
Production(000 tons)
60 65 80 73 97 105 93 111 117
(i) Fit the linear trend by the method of least squares by taking the origin at the middle. Also calculate the trend values.(ii) Predict the production of steel for the year 1965.
Solution:
Year y x xy x2 (Trend value) = 89 + 7.1x
195619571958195919601961196219631964
6065807397
10593
111117
–4–3–2–101234
–240–195–160–730
105186333468
16941014916
60.667.774.881.989.096.1
103.2110.3117.4
801 0 424 60 The least squares trend line is
= a + bx
a = b =
Hence = 89 + 7.1x
(ii) Prediction of the production of steel for the year 1965 isFor x = 5 ; = 89 + 7.1(5) = 124.5
Example 8. Fit a linear trend to the following data (take origin at the middle and half year unit).
Year 1991 1992 1993 1994 1995 1996Value 5 8 12 15 20 24
Also show that sum of residuals is equal to zero.Solution:
Year y x = xy x2 = 14 + 1.91x e = y
1991
1992
1993
1994
1995
1996
5
8
12
15
20
24
–5
–3
–1
1
3
5
–25
–24
–12
15
60
120
25
9
1
1
9
25
4.45
8.27
12.09
15.91
19.73
23.55
0.55
–0.27
–0.09
–0.91
0.27
0.45
Page # 29
84 0 134 70 0The least squares trend line is
= a + bxSince x = 0, therefore above equations reduce to:
a =
b =
Hence = 14 + 1.91xSince e = 0, which shows that sum of residuals is zero.
INDEX NUMBERSQ.1. What is index number?Ans. An index number is a device which measures the changes in a variable or group of related variables with respect to
time or space.
Q.2. What is simple index number?Ans. An index number is called simple if it measures a relative change in a single variable with respect to base.
Q.3. Give some examples of simple index number.Ans. Index number for wages of employees, index number of cotton prices in Sahiwal etc.
Q.4. What is composite index number?Ans. An index number is called composite index number if it measures a relative change in a group of related variables with
respect to base.
Q.5. What are the types of index number as regard to base?Ans. (i) Fixed base index
(ii) Chain base index.
Q.6. Define price relative.Ans. Price relative is the percentage ratio of the price in current year and the price in a base year.
Q.7. Define link relative.Ans. Link relative is the percentage ratio of the price in current year and the price in the preceding year.
Q.8. What is price index number?Ans. A price index number measures the changes in the whole sale or retail prices of a particular commodity or a number of
commodities with respect to base.
Q.9. What is quantity index number?Ans. A quantity index number measures the changes in the quantity or volume of goods produced or consumed.
Q.10. Define C.P.I.Ans. A consumer price index number measures the changes in prices of a specified basket of goods and services consumed
in the given period relative to the base period.
Q.12. What do you mean by “basket” of goods?Ans. The basket of goods and services will contain items like
Page # 30
(i) Food (ii) House rent (iii) Education (iv) Clothing (v) Misc.
Q.13. Write down the formula of C.P.I.Ans. (i) Pon 100 (Aggregate Expenditure Method)
(ii) Pon [Weighted Average of Relatives]
Q.14. Write the formula of price relative.Ans. I = 100
Q.15. What are the other names of cost of living index numbers?Ans. Consumer price index number or retail price index number.
Q.16. What is whole – sale price index?Ans. An index number considering the price quotations of whole-sale markets is called as whole-sale price index.
Q.17. What is un-weighted index number?Ans. An index number that measures the change in the price (or quantity) of a group of commodities when the relative
importance of commodities is not taken into account is called un-weighted index number.
Q.18. What is weighted index number?Ans. An index number that measures the change in the prices (or quantities) of a group of commodities when the relative
importance of commodities has been taken into account is called weighted index number.
Q.19. Name the ideal index number?Ans. Fisher’s index number is called ideal index number.
Q.20. What is base year weighted index number?Ans. Laspeyre’s index number is called base-year weighted index number.
Q.21. What is the other name of Paasche’s index?Ans. Paasche’s index number is also called current year weighted index number.
Q.22. Give two uses and two limitations of index number.Ans. Uses of index Numbers:
(i) Index numbers are of great helpful in forecasting business conditions.(ii) Index numbers are useful in education for I.Q. comparison and effectiveness of teaching systems.
Limitations of Index Numbers:(i) All index numbers are not suitable for all purposes.(ii) Different methods of construction yield different results.
Important Points and Formulae Price Relatives P.R =
Link Relatives L.R =
Simple Aggregative Index Pon =
Laspeyre’s (Base year weighted) Index
Pon =
Paasche’s (Current year weighted) Index
Pon =
Fisher’s Ideal Index Pon =
Consumer Price Index /Cost of Living Index
(i) Aggregative Expenditure Method:
Pon =
(ii) Weighted Average of Relatives:
Pon = , I =
Page # 31
Q.1. For the following data construct index number by(i) fixed base and(ii) chain base method taking 1960 as base:
Year: 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970Price: 40 47 52 50 54 30 39 45 50 60 55Solution:
Year Price
(i) Fixed Base (ii) Chain Base
P.R = L.R = Chain indices
19601961196219631964196519661967196819691970
4047525054303945506055
100117.513012513575
97.5112.5125150
137.5
100117.5
110.6496.15108
55.55130
115.38111.11
12091.67
100117.513012513575
97.5112.5125150
137.5
Q.2 Construct chain indices for the prices of sugar (per kg) for the year 1962 1970.
Year Price (Rs) Year Price (Rs)
19621963196419651966
0.801.001.201.251.25
1967196819691970
1.421.501.621.75
Solution:
Year Price (Rs) Chain Indices
196219631964196519661967196819691970
0.801.001.201.251.251.421.501.621.75
125120
104.17100
113.60105.63
108108.02
125150
156.26156.26177.51187.50202.5
218.74
Page # 32
Q.3 Construct with the help of the following data:(i) Laspeyre’s (ii) Paasche’s Index
ItemBase Year Current Year
Price Quantity Price Quantity
ABC
322
7110762
332
268870
Solution:
ItemBase year Current year
p0qo p1qo p1q1 poq1po qo p1 q1
A
B
C
3
2
2
71
107
62
3
3
2
26
88
70
213
214
124
213
321
124
78
264
140
78
176
140
551 658 482 394
(i) Laspeyre’s Index:
P01 =
=
(ii) Paasche’s Index:
P01 =
= = 122.34
Q.4. Construct index number for the year 1992 on the basis of the year 1987 of the following by using:(i) Laspeyre’s (ii) Paasche’s(iii) Fisher’s Ideal Formula
YearA B C
Price Quantity Price Quantity Price Quantity
19871992
54
1012
87
2627
65
1314
Solution:
Item1987 1992
p1qo poqo p1q1 poq1po qo p1 q1
ABC
586
102613
475
122714
4018265
5020878
4818970
6021684
287 336 307 360
(i) Laspeyre’s Index:
P01 =
= = 85.42
(ii) Paasche’s Index:
Page # 33
P01 =
= = 85.28
(iii) Fisher’s Ideal Index:
P01 =
=
= 0.85347 100 = 85.35
Q.5. Find index number(i) taking the year 1980 as base(ii) taking the average of 1st three years as base(iii) taking the average of all the years as base
Year Price in Rs
198019811982198319841985198619871988
22.524.028.530.035.032.537.546.548.5
Solution:
Year Prices
(i)P.R =
(ii)P.R =
(iii)P.R =
198019811982198319841985198619871988
22.524.028.530.035.032.537.546.548.5
100106.67126.67133.33155.56144.44166.67206.67215.56
9096
114120140130150186194
66.3970.8284.1088.52
103.2895.90
110.65137.21143.11
305
Average of first three years =
Average of all the years =
Q.6. Find chain indices from the following price relatives
Year Price Relatives
Page # 34
A B C
197019711972197319741975
100103112115120125
1009590
10096
102
100110115120125128
Solution:
YearLink Relatives = L.R =
Mean Chain Indices
A B C197019711972197319741975
100103
108.74102.68104.35104.17
10095
94.74111.11
96106.25
100110
104.55104.35104.17102.4
100102.67102.68106.05101.51104.27
100102.67105.42111.80113.49118.34