Statistics and probability theory mth 262

Preview:

DESCRIPTION

get it and enjoy from myside

Citation preview

Statistics and Probability Theory MTH-262

BS(CS)-V

Course Instructor: Sajdah Hassan

Lecture 4 Descriptive Statistics

Descriptive Statistics

Presenting Data

Describing Data

Presentation of Data

1. Classification

2. Tabulation

3. Frequency Distribution

4. Stem and Leaf Display

5. Graphical Presentation

Classification

The classification is the process of dividing a set of observations or objects into classes or groups in such a way that

1. Observation or objects in same class or group are similar

2. Observation or objects in same class or group are dissimilar to Observation or objects in other class or group

Types of Classification When the data are sorted according to one criterion only, it is

called a “Simple classification” or a “one-way classification

Classification is called two-way classification when data is sorted according to two criteria

Similarly manifold classification is made according to several criteria

Data may be classified according to qualitative, temporal and geographical characteristics

Example:

2. Tabulation

The process of placing classified data into tabular form is known as tabulation. A table is a symmetric arrangement of statistical data in rows and columns. Rows are horizontal arrangements whereas columns are vertical arrangements. It may be simple, double or complex depending upon the type of classification.

Main Parts of Tables

A statistical table has at least four major parts and some other minor parts.(1) The Title(2) The Box Head (column captions)(3) The Stub (row captions)(4) The Body(5) Prefatory Notes(6) Foots Notes(7) Source Notes

Cont…

(1) The Title:A title is the main heading written in capital shown at the top of the table. It must explain the contents of the table and throw light on the table as whole different parts of the heading can be separated by commas there are no full stop be used in the little.

(2) The Box Head (column captions):The vertical heading and subheading of the column are called columns captions. The spaces were these column headings are written is called box head. Only the first letter of the box head is in capital letters and the remaining words must be written in small letters.

(3) The Stub (row captions):The horizontal headings and sub heading of the row are called row captions and the space where these rows headings are written is called stub.

(4) The Body:It is the main part of the table which contains the numerical information classified with respect to row and column captions.

(5) Prefatory Notes :A statement given below the title and enclosed in brackets usually describe the units of measurement is called prefatory notes.

(6) Foot Notes:It appears immediately below the body of the table providing the further additional explanation.

(7) Source Notes:The source notes is given at the end of the table indicating the source from when information has been taken. It includes the information about compiling agency, publication etc…

Cont….

General Rules of Tabulation:

A table should be simple and attractive. There should be no need of further explanations (details).

Proper and clear headings for columns and rows should be need.

Suitable approximation may be adopted and figures may be rounded off.

The unit of measurement should be well defined.

If the observations are large in number they can be broken into two or three tables.

Thick lines should be used to separate the data under big classes and thin lines to separate the sub classes of data.

Table format:----THE TITLE----

----Prefatory Notes----

----Box Head----

----Row Captions---- ----Column Captions----

----Stub Entries----  ----The Body----

 

Example

Frequency Distribution

Frequency Distribution

One method for simplifying and organizing data is to construct a frequency distribution.

A frequency distribution is an organized tabulation showing exactly how many individuals are located in each category on the scale of measurement. A frequency distribution presents an organized picture of the entire set of scores, and it shows where each individual is located relative to others in the distribution.

A frequency distribution is a tabular summary ofdata into classes or groups together with the number of observation in each class or group is called frequency distribution

Example: Marada Inn

Guests staying at Marada Inn were asked to rate the quality of their

accommodations as being excellent, above average, average, below average, or poor. The ratings provided by a sample of 20 guests are:

Below Average Poor Average Above Average

Above Average Below Average Above Average Poor

Above Average Average Below Average Above Average

Average Above Average Excellent Above Average

Above Average Average Above Average Average

Frequency Distribution

Rating Frequency

Poor 2

Below Average 3

Average 5

Above Average 9

Excellent 1

Total=20

Grouped frequency distributions - can be used when the range of values in the data set is very large. The data must be grouped into classes that are more than one unit in width.Examples - Blood samples taken from 36 male volunteers as part of a study to determine the natural variation in CK concentration.

The serum CK concentrations were measured in (U/I) are as follows:

Grouped frequency distribution

Cont…

121 82 100 151 68 58

95 145 64 201 101 163

84 57 139 60 78 94

119 104 110 113 118 203

62 83 67 93 92 110

25 123 70 48 95 42

Grouped Frequency distribution

Serum CK (U/I)Class limits

Frequency Cumulative Frequency

20-39 1 1

40-59 4 5

60-79 7 12

80-99 8 20

100-119 8 28

120-139 3 31

140-159 2 33

160-179 1 34

180-199 0 34

200-219 2 36

Total 36

Terms Associated with a Grouped Frequency Distribution

Class limits represent the smallest and largest data values that can be included in a class.

In the serum ck example, the values 20 and 39 of the first class are the class limits.

The lower class limit is 20 and the upper class limit is 39.

The class boundaries can be used to separate the classes so that there are no gaps in the frequency distribution.

The class width for a class in a frequency distribution is found by subtracting the lower (or upper) class limit of one class minus the lower (or upper) class limit of the previous class.

Guidelines for Constructing a Frequency Distribution

There should be between 5 and 20 classes..

The classes must be mutually exclusive.

The class must be equal in width

Procedure for Constructing a Grouped Frequency Distribution

Find the highest and lowest value.

Find the range

Select the number of classes desired.

Formula for number of classes is

Where

Find the width by dividing the range by the number of classes and rounding up

Procedure for Constructing a Grouped Frequency Distribution

Select a starting point (usually the lowest value); add the width to get the lower limits.

Find the upper class limits.

Find the boundaries.

Tally the data, find the frequencies and find the cumulative frequency

Example: Grouped Frequency distribution

In a survey of 20 patients who smoked, the following data were obtained. Each value represents the number of cigarettes the patient smoked per day. Construct a frequency distribution using six classes. (The data is given on the next slide.)

Example: Grouped frequency distribution

Example: Grouped Frequency distribution

Step 1: Find the highest and lowest values: H = 22 and L = 5.

Step 2: Find the range: R = H – L = 22 – 5 = 17.

Step 3: Select the number of classes desired. In this case it is equal to 6.

Step 4: Find the class width by dividing the range by the number of classes. Width = 17/6 = 2.83. This value is rounded up to 3.

Step 5: Select a starting point for the lowest class limit. For convenience, this value is chosen to be 5, the smallest data value. The lower class limits will be 5, 8, 11, 14, 17 and 20.

Example: Grouped Frequency distribution

Step 6: The upper class limits will be 7, 10, 13, 16, 19 and 22. For example, the upper limit for the first class is computed as 8 - 1, etc.

Step 7: Find the class boundaries by subtracting 0.5 from each lower class limit and adding 0.5 to the upper class limit.

Step 8: Tally the data, write the numerical values for the tallies in the frequency column and find the cumulative frequencies.

The grouped frequency distribution is shown on the next slide.

Example: Grouped Frequency distribution

Class Limits Class Boundaries Frequency Cumulative Frequency

05 to 07 4.5 - 7.5 2 208 to 10 7.5 - 10.5 3 5

11 to 13 10.5 - 13.5 6 11

14 to 16 13.5 - 16.5 5 1617 to 19 16.5 - 19.5 3 1920 to 22 19.5 - 22.5 1 20

Mid points or class marks

We can also find the mid point of each class by averaging the lower and upper class limit or class boundary of that class.

Class limits Mid points or class marks

5-7 6

8-10 9

11-13 12

14-16 15

17-19 18

20-22 21

Question

Make a frequency distribution of given data relating to weights recorded to the nearest grams of 60 apples picked out random from a consignment

106,107,76,82,109,107,115,93,187,95,123,125,111,92,86,70,126,68,130,129,139,119,115,128,100,186,84,99,113,204,111,141,136,123,90,115,98,110,78,185,162,178,140,152,173,146,158,194,148,90,107,181,131,75,184,104,110,80,118,82

Stem and Leaf Display

A clear disadvantage of using a frequency table is that the identity of individual observations is lost in grouping process. To overcome this drawback stem and leaf display is used which offers a quick way for sorting and displaying data where each number in data is divided into two parts.

I. STEM: A stem is the leading digit(s) of each number and used in sorting

II. LEAF: Leaf is the rest of numbers or the trailing digit(s) and shown in display

Stem and leaf Display A vertical line separates the leaf( 0r leaves) from the

stem .for example the number 243 can be split in two ways:

The resulting display provides an organized picture of the entire distribution. The number of leafs beside each stem corresponds to the frequency, and the individual leafs identify the individual scores

Leading digit

Trailing digit

Or Leading digit

Trailing digit

2 43 24 3

stem Leaf stem leaf

Example: stem and leaf display

Graphical Representation of data

Graphical Representation

The visual display of statistical data in the form of points ,lines areas and other geometrical forms and symbols ,is in most general term is known as graphical representation

Statistical data can be studied with this method without going through figures, presented in form of tables

Graphs for Nominal or Ordinal Data

1. Bar chart

A bar chart displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison.

For non-numerical values (scores), a bar cart is used

Spaces between adjacent bars indicates discrete categories without order (nominal) or of un measurable width (ordinal)

Multiple bar chart ,component bar chart and sub-divided rectangles also used for graphical representation of categorical data.

Example: Bar Graph

Example

Data of total persons in a ship can be shown in table according to the classes

Example: Bar Chart

2.Pie Charts

When we are interested in parts of the whole, a pie chart might be our display of choice.

Pie charts show the whole group of cases as a circle. They slice the circle into pieces whose size is fraction of

the whole proportional to the in each category. Pie chart for ship data

How to Accurately Create Pie Charts

Convert Your Data: Convert all data values to percentages of the whole data set.

For example, four radishes, three cucumbers, two carrots and one pepper equals 40 percent radishes, 30 percent cucumbers, 20 percent carrots and 10 percent peppers.

Convert the percentages into angles. Since a full circle is 360 degrees, multiply this by the percentages to get the angle for each section of the pie. For the radishes, 0.4 X 360 = 144 degrees. For the cucumbers, 0.3 X 360 = 108 degrees. For the carrots, 0.2 X 360 = 72 degrees. For the peppers, 0.1 X 360 = 36 degrees.

Make sure the angle calculations are correct by adding all the angles. The total should be 360. 144 + 108 + 72 + 36 = 360. You may be off by a tenth or so due to rounding, so be

careful.

How to Accurately Create Pie Charts (cont..) Draw the Chart:

Draw a circle on a blank sheet of paper, using a compass. While a compass is not necessary, using one will make the chart much neater and clearer by ensuring the circle is even.

Draw a radius, from the center to the right edge of the circle, using the ruler or straight edge. This will be the first base line.

Measure the largest angle in the data with the protractor, starting at the baseline, and mark it on the edge of the circle. Use the ruler to draw another radius to that point. Use this new radius as a base line for your next largest angle and continue this process until you get to the last data point. You will only need to measure the last angle to verify its value since both lines will already be drawn.

Label and shade the sections of the pie chart to highlight whatever data is important for your use.

Graphical presentation of Quantitative Data

Histogram Another common graphical presentation of

quantitative data is a histogram In histogram the variable of interest is placed on

the horizontal axis. A rectangle is drawn above each class interval with

its height corresponding to the interval’s frequency Unlike a bar graph, a histogram has no natural

separation between rectangles of adjacent classes.

Example: Histogram

An example of a frequency distribution histogram. The same set of quiz scores is presented in a frequency distribution table and in a histogram.

Example: Histogram

Frequency Polygons

In a polygon, a dot is centered(use mid points if intervals are given) above each score so that the height of the dot corresponds to the frequency. The dots are then connected by straight lines. An additional line is drawn at each end to bring the graph back to a zero frequency.

Example: Frequency Polygon

Example: Frequency Polygon

Recommended