Quantative Techniqes

1

Quantitative Techniques in Business (QTB)

Quantitative Techniques are the techniques used to gather, sort, analyze and interpret numerical data in order to improve business decisions.

Numerical data Numerical data (or quantitative data) is data measured or identified on a

numerical scale. Numerical data can be analysed using statistical methods, and results can be displayed using tables, charts, histograms and graphs

Examples

Company sales (millions) 18, 12, 20, etc

Number of employees in company(hundreds) 15, 8, 5, etc

Prof.Muhammad Ilyas ,std.Muhammad Saeed 2

Why Study QTB

Studying QTB is essential as it enables to

Gather, sort, analyze and interpret the data

Have needed, timely, accurate, yet relevant information.

Understand and compare different types of situations

Predict and forecast about the future needs of the business

Develop effective policies and business related strategies

Make effective decisions to achieve business goals efficiently

Research is based on QTB

Final thesis is based on QTB


Research Problem

“Any problem or opportunity that needs to be addressed through research process of data collection and analysis is called

Research Problem”

Examples

Human Resource manager wants to develop HR policies regarding

employees turnover in order to reduce it.

Marketing manager wants to launch a new product successfully using advertisement as promotional tool

Finance manager needs to invest excessive money profitably


Problem Statement A problem statement is a clear and concise description

of any business issue that seeks for Description, Association or difference of two or more variables.

Example Measure the annual turnover of employees in Higher educational

sector of Pakistan

Does advertisement contribute to the sales of a new product in the market

Which of the two options i.e. stock market or real estate is better for investment.


What is Variable? Vary + able = Change + able

Variable is a characteristic of anything that can vary (Change).

Examples

Gender (Male, Female)

Age (20 years, 30 years, 50 years)

Motivation level (High, Medium, Low)

Constant is a characteristic that do not vary e.g. If all students are male in a class then Gender will be constant


Advertisement Sales

Types of Variable with respect to relation

Awareness

Competitors product, price, packaging, placement

Budget


Variable

Categorical

Nominal Ordinal

Numerical

Discrete Continuous

Gender 1. Male

2. Female

Motivation 1. Highly Motivated

2. Moderately

Motivated

3. Less Motivated

1. No of students

2. No of chairs

3. Collar size

1. Height

2. Weight

3. speed

Types of Variable with respect to data

Prof.Muhammad Ilyas ,std.Muhammad

Saeed 8

Categorical Variable A variable whose values are not numerical in nature

Variables Values

Gender Male, female

Religion Islam, christianity, Jews, etc

Motivation level High, medium, low

Types of Categorical variable

1. Nominal variable

A categorical variable whose values are not ordered

Example

Gender Male, Female

2. Ordinal variable

A categorical variable whose values are in ordered

Example

Education Metric, inter, graduation


Numerical Variable A variable whose values are numerical in nature

Variables Values

Collar size 14, 14.5, 15, 15.5……….

Height 5.7, 5.8, 5.3

No of employees 23, 45, 69, 100

Types of Numerical variable

1. Discrete variable

A numerical variable whose values have same interval

Example

collar size 14.5, 15, 15.5……..

2. continuous variable

A numerical variable whose values don’t have same interval

Example

speed 40.1, 45, 67……….


Research Question Research problem needs to be translated into one or

more research questions that are defined as

“A research question is an interrogative statement that seeks for the tentative relationship among variables and clarifies what the researcher wants to answer.”

Example What is the impact of advertisement on sales of a new product in

the market

What is the annual turnover of employees in Higher educational institutions of Pakistan

Does investing in stock market yield more return on investment as compare to investment in real estate.


Type of Research Question Descriptive: A question that is answered through Summarising

data about a single variable E.g.: What is the annual turnover of employees in Higher

educational institutions of Pakistan

Associational: A question that is answered through determining strength and direction of relationship between two or more variables E.g.: What is the impact of advertisement on sales of a new product in

the market

Difference: A question that is answered through comparing and contrasting two groups or variables E.g.: Does investing in stock market yield more return on investment as

compare to investment in real estate.


Research Hypotheses “Research hypotheses are predictive statements about

the relationship between two variables”

Types of Hypothesis

There are two types of hypothesis

1. Null Hypothesis Ho = There is no relationship between Advertising and Sales

2. Alternative Hypothesis H1 = There is relationship between advertising and sales


Research Question Vs. Hypothesis

Research question Hypothesis

o Interrogative statement Simple statement

oNon-Predictive Predictive

oNon-Directional Directional


Activity

In groups of four, use the variables provided to write:

An associational question

A difference question

A descriptive question


Data Set of raw facts figures is called Data

Example: Age- 16, 18, 20, 21, 23,

Nationality- Pakistani, Indian, American

Types of Data

Data

Nature

Qualitative Quantitative

Time frame

Cross-sectional

Time-Series


Thank You!


Lecture #2 I am really thankful to my gorgeous teachers Sir Dr.Muhammad Ilyas , for that great knowledge.


Primary and Secondary Data


DATA A set of raw facts and figures are called data

OR

Example: Age 16, 18, 20, 21, 23

Nationality Pakistani, Indian,

American


TYPES OF DATA Data

Nature

Quantitative Data

Qualitative Data

Time Frame

Cross Sectional

Data

Time Series Data

Longitudinal Data

Source

Primary Data

Secondary Data


TYPES OF DATA On the basis of Nature

Nature wise data can be of two types:

1) Quantitative Data: A data that consist of numbers for example data about age consists of values like 16, 18, 20, 21, 23

2) Qualitative Data: A data that consists of words rather than numbers. For example nationality includes Pakistani, Indian, American


TYPES OF DATA Cont… On the basis of Time Frame:

Time wise data can be categorized into two

1) Cross-Sectional Data: Data that is collected from different units at once

2) Time Series Data: Data that is collected from same units on different time with same time interval

3) Longitudinal Data: A dataset is longitudinal if it tracks the same type of information on the same subjects at multiple points in time.


SOURCES OF DATA Primary Data Source: Primary data is such data

which comes from an original source and are collected with a specific research question in mind

For example: You want to collect data on Employee Motivation

Secondary Data Source: Secondary data represents the previously recorded data collected for another purpose.

For example: You want to collect the data on profit of MCB Bank for 5 years


Survey method is used to collect primary data

WHAT IS A SURVEY?

Survey is a quantitative research strategy that involves

the structured collection of data from a pre-

determined sample.

HOW TO COLLECT PRIMARY DATA


Survey Design 1. Objectives of Survey

2. Survey Design

3. Pilot Test

4. Field work/Data Collection

5. Data Preparation

6. Data Analysis and Interpretation

7. Discussion and Conclusion

8. Report Writing


The first step of survey design is to clearly define that

why we are going to conduct the survey.

Example: The basic aim of survey is to collect updated, accurate yet

relevant data in order to answer a research problem

1: Objectives of Survey


After setting objectives of survey we develop the plan

(design) of survey deciding that:

Whom to survey (Sample Selection)

Where to survey (Site Selection)

How to survey (Method)

What to survey (Questions for required information)

2: Survey Design


How to develop Questionnaire? 1. Decide what information is required.

2. Draft some questions on each variable to elicit the information

3. Put them into a meaningful order and format

4. Pre-test the questionnaire

5. Go back to Step 1, and continue until the questionnaire is perfect.


It is process of checking/assessing the accuracy of the

wording sequence and ability to understand the

question by conducting survey from one or two

respondent as a trail in order to refine questionnaire.

3: Pilot Test


It is a process of collecting data actually from the

target sample. It can be done in following ways:

Self administered survey

Postal survey

Online survey

4: Fieldwork/conduct a survey


After getting your survey completed and knowing the interface

of the SPSS the next step is to prepare the data for analysis.

This process involves four steps.

1. Coding the questionnaire.

2. Defining the variables in SPSS variable view.

3. Entering the data in SPSS data view.

5: Data Preparation


It is a process of summarizing, organizing and transforming data with

the goal of highlight the useful information, suggesting conclusions in

order to answer the research question and support good decision

making.

Data can be analyzed in two ways:

Descriptive Analysis

Inferential Analysis

6: Data Analysis and Interpretation


Interpretation is a process of making sense of results

by explaining and assigning meaning to them.

Interpretation


Clarity of thoughts

Complete and self explanatory

Comprehensive and compact

Accurate in all aspects

Support facts

Suitable format for readers

Proper date and signature

reference

Reliable sources

Logical manner

Report writing


1. Discussion:( same result like previous result or change discussion phase like discussion with proper background

2. Conclusion: (what is our result of your study and what we achieve)

7: Discussion and Conclusion


8: Report Writing


Sources that are used to collect secondary data can be:

Documentary

Government survey

Academic survey

Company’s financial statements

Bank reports

HOW TO COLLECT SECONDARY DATA


www.wdi.com

www.pwt.com

www.ifs.com

www.fbs.com

www.sbp.com

Important links of Secondary Data


http://www.wdi.com/

http://www.pwt.com/

http://www.ifs.com/

http://www.fbs.com/

http://www.sbp.com/

Step: 1 Search WDI


Step 2: Click Data Bank on the WDI Web Page


Step 3: Select your Desired Country for the Extraction of Data


Step 4: Select Required Variables from the list


Step 5: Select Years


Step 6: View Data


Step 7: Click on Excel to Export Data


Lecture #3


In the name of Allah Kareem, Most Beneficent, Most Gracious, the Most Merciful !



50

Before further processing of the data we should get to know about SPSS software first .

SPSS SPSS stands for “statistical package for social sciences”. It is basically used for the analysis of quantitative data .

How to open SPSS

Introduction to SPSS

Start Menu

Programs SPSS Inc. SPSS 16.0

Prof.Muhammad Ilyas ,std.Muhammad Saeed


52

Welcome window SPSS 16.0


53

SPSS Interface

Title Bar

Menu Bar

Tool Bar

Variable definition criteria

Serial Number / Cases

Work sheet

SPSS Views


54 Prof.Muhammad Ilyas ,std.Muhammad Saeed

55

After defining the variables enter the data in data view for each

case (row wise) against each variable (column wise)

Data Entry



57

After collecting the data, data processing is started that involves

1. Data coding 2. Defining the variables 3. Data entry in the software 4.Checking for error

Data Processing


58

Please circle or supply your answer ID_________

SD SA

1. I would recommend this course to other students 1 2 3 4 5

2. I worked very hard in this course 1 2 3 4 5

3. My college is : Arts & sciences___ _ Business____ Engineering____

4. My gender is M F

5. My GPA is _____________

6. For this class, I did: (Check all that apply

The reading

The homework

Extra credit

SAMPLE QUESTIONNAIRE


59

Coding is the process of assigning numbers to the values or

levels of each variable.

Rules of Coding

1. All data should be numeric.

2. Each variable for each case or participant must occupy the same

column.

3. All values (codes) for a variable must be mutually exclusive.

4. Each variable should be coded to give maximum information.

5. For each participant, there must be a code or value for each

variable.

6. Apply any coding rule consistently for all participants.

7. Use high numbers (values or codes) for the “agree,” “good,” or

“positive” end of a variable that is ordered

Coding


60

In SPSS first of all the variables are defined in variable view

This includes

1. Name of the variable (Short without space)

2. Type (Numeric, String)

3. Width (8, 10, etc)

4. Decimals (2, 3, 5 etc)

5. Label (Full name of the variable)

6. Values (answer categories with codes)

7. Missing (blank, multiple, wrong answers)

8. Columns (6, 8, 10 etc)

9. Align (Left, right, centre)

10. Measure (Nominal, ordinal, scale)

Defining the variables


61

SUPERIOR GROUP OF COLLEGES 61


Lecture 4


Data File Management


After this lecture you would: Learn four useful data transformation techniques:

Count

Recode (Revise and Reverse)

Compute a new variable


Problem 5.1: Count Math Courses Taken

How many math courses (algebra1, algebra2, geometry, trigonometry and calculus) did each of the 75

participants take in high school? Label your new variable


Problem 5.2: Recode and Relabel Mother’s and Father’s Education

Recode mother’s and father’s education so that those with no postsecondary education have a value of 1,

those with some posts secondary have a value of 2, and those with a bachelor’s degree or more have a value of

3. Label the new variables and values


Problem5.3: Recode and Compute Pleasure Scale Score

Compute the average pleasure scale from item02, item06, item10 and item14 after reversing (use the

Recode command) item06 and item10. Name the new computed variable pleasure and label its highest and

lowest values.


Lecture #5



70

After this session the students will be able to analyze the collected data using descriptive statistics by

• Producing summaries of data in both tabular and graphical forms

• Calculating the central tendencies using mean median and modes

• Calculate the dispersion of data using range, IQR and Standard Deviation

• Checking if the data is normally distributed using Normal curve phenomenon

Session Objectives


71 SUPERIOR GROUP OF COLLEGES

Analyzing Data

“The process of breaking down the complex

data to gain better understanding of it.”

There are two types of statistics

Descriptive statistics

Inferential statistics

In this session we will work on descriptive statistics


72 SUPERIOR GROUP OF COLLEGES

Descriptive statistics are used to Describe, Summarize,

Organize, and Simplify data in quantitative terms. We will

cover

1. Summarizing Numerical Data

2. Measures of Central Tendency

3. Measurement of Dispersion

4. Checking Data Normality

Descriptive statistics


73

1. Summarizing

Variable

Categorical

Frequency Distribution Table

Bar chart

Numerical

Five Figure Summary

Box Plot / Histograms


74

A frequency distribution is a tally or count of the number of times each score on a

single variable occurs

Frequency Distribution.

Summarizing categorical data

Analyze Descriptive Statistics Frequencies move religion to the variable box OK (make sure that the Display frequency tables box is checked)

Frequency table for religion religion

Frequency Percent Valid Percent Cumulative

Percent

Valid Muslims 30 40.0 44.8 44.8

Christians 23 30.7 34.3 79.1

Hindus 14 18.7 20.9 100.0

Total 67 89.3 100.0

Missing other religion 4 5.3

blank 4 5.3

Total 8 10.7

Total 75 100.0


75

Interpretation:

In this example, there is a Frequency column that shows the

numbers of students who marked each type of religion (e.g., 30

said Muslims, 23 Christians, 14 Hindus, 4 is missing and 4 left

it blank).Notice that there are a total of (67) for the three

responses considered Valid and a total (8) for the two types of

responses considered to be Missing as well as an overall total

(75). The Percent column indicates that 40.0% are Muslims ,

30.7% are Christians , 18.7% are Hindus, 5.3% had one of several

other religions, and 5.3% left the question blank. The Valid

Percentage column excludes the eight missing cases and is often

the column that you would use. Given this data set, it would be

accurate to say that of those not coded as missing, 44.8% were

Muslims and 34.3% Christians and 20.9% were Hindus.


76

With Nominal data, it is better to make a bar graph or chart of the frequency distribution of variables like religion, ethnic group, or other nominal variables; the points that happen to be adjacent in your frequency distribution are not by necessarily adjacent. To get a bar chart select

Graphs legacy dialogues interactive bar chart move variable to the box OK

Bar Charts Summarizing categorical data


77

Five Figure Summary

It is used to summarize the Numerical data. Five figures include

locating the following values in data

1. Minimum value

2. Maximum Value

3. Median

4. Lower Quartile 5. Upper Quartile

Summarizing Numerical data


78

Exercise: Calculate Five Figure Summary

2 1 3 2 1 4 3 5 8 8 7 7 4 5 6 2 6 6 6 6

Department B: 20 employees

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

1 1 2 2 2 3 3 4 4 5 5 6 6 6 6 6 7

18 19 20

7 8 8

Min = 1

Max = 8

Median = 5

Lower Quartile = 2.5

Upper Quartile = 6


79

Exercise: Calculate Five Figure Summary

Department B: 30 employees

2 1 3 6 3 8 14 5 16 5 6 6 7 7 7

7 9 8 4 6 8 7 8 3 8 10 6 8 18

12


80

box & whisker plot

For ordinal and normal data, the box and whiskers plot is useful The box and whisker

plot is a graphical representation of distribution of scores and is helpful in distinguishing

between ordinal and normally distributed data

Graphs legacy dialogues interactive box plot move gender to the x-axis and move SAT math to y-axis OK


81

Interpretation

The case processing summary table shows the valid N=75,

with no missing values for total sample of 75 for the variable

math achievement. The plot shows a box plot for math

achievement. The box represents the middle 50% of the

cases (M=13), lower end of the box shows lower quartile

(Q1=7.67), and upper end of the quartile shows upper

quartile (17.00). The whiskers indicate the expected range

(25.33) of scores from minimum (Min=-1.67) to Maximum

(Max=23.67). Scores outside of this range are considered

unusually high or low, such scores are called outliers. There

are no outliers for in this case.


82

Histogram Histograms are just like bar graph but there is no space between the boxes, indicating that there is a continuous variable theoretically underlying the scores. Histograms can be used even if data, as measured, are not continuous, if the underlying is conceptualized as continuous.

To draw a histogram select:

Graphs legacy dialogues interactive histogram move variable to the box OK


83

Interpretation In this frequencies (number of students), shown by

the bars are for a range of points (in this case SPSS

selected a range of 50: 250-299, 300-349, 350-399,

etc). Notice that the largest number of students

(about 20) had scores in the middle two bars of the

range (450-499 and 500-549). Similar small

numbers of students have very low and very high

scores. The bars in the histogram form a distribution

(pattern or curve) that is similar to the normal, bell

shaped curve. Thus, the frequency distribution of

the SAT math scores is said to be approximately

normal.


84

Mean. The arithmetic average or mean takes into account all of the available information in computing the central tendency of a frequency distribution.

Median. The middle score or median is the appropriate measure of central tendency for ordinal level raw data.

Mode. The most common category, or mode can be used with any kind of data generally provides the least precise information about central tendency

MEASUREMENT OF CENTRAL TENDENCY


85

Measure of Central Tendency

Exercise

Analyze Descriptive statistics Frequencies put SAT Math into variable box click on statistics mark mean, median and mode click continue Ok

Statistics

scholastic aptitude test - math

N Valid 75

Missing 0

Mean 490.53

Median 490.00

Mode 500 Prof.Muhammad Ilyas ,std.Muhammad Saeed

86

Measures of Variability

Range—The range (highest minus lowest score) is the crudest measure of variability but does give an indication of the spread in scores if they are ordered.

Inter quartile range (IQR)-

IQR=Q3-Q1

Standard Deviation—The standard deviation is based on the deviation (x) of each score from the mean of all scores.


87

Analyze Descriptive statistics Frequencies put SAT Math into variable box click on statistics mark Range and std deviation click continue Ok


88

Descriptive Statistics

The Normal Curve The frequency distributions of many of the variables used in the behavioral

sciences are distributed approximately as a normal curve when N is large.

Properties of Normal Curve

1. The mean, median and mode are equal.

2. It has one “hump” and this hump is in the middle of the distribution.

3. The curve is symmetric. If you fold the normal curve in half, the right side would

fit perfectly with the left side; that is, it is not skewed.

4. The range is infinite.

5. The curve is neither too peaked nor too flat and its tails are neither too short nor

too long.


89

Nominal Dichotomous Ordinal Normal

Frequency Distribution Yes Yes Yes Ok

Bar Chart Yes Yes Yes OK

Histogram No No OK Yes

Frequency Polygon No No OK Yes

Box &Whisker Plot No No Yes Yes

Central Tendency

Mean No OK OK Yes

Median No OK Yes OK

Mode Yes Yes OK OK

Variability

Range No Always 1 Yes Yes

Standard Deviation No No OK Yes

Interquartile Range No No OK OK

How many categories Yes Always 2 OK No

Shape

Skewness No No Yes Yes


90



Lecture #6


In the name of Allah Kareem, Most Beneficent, Most Gracious, the Most Merciful !



94

After studying this session you would be able to

• Understand and infer results from data in order to answer the associational and differential research questions using different parametric and non parametric tests.

• understand implement and interpret the chi-square, phi and cramer’s V

• understand, implement and interpret the correlation statistics

• understand, implement and interpret the regression statistics

• understand, implement and interpret the T-test statistics

Lesson Objectives


95

1.Non parametric test. 1.Chi square /Fisher exact 2.Phi and cramer’s v

3.Kendall tau-b

2.Parametric test 1.Correlation

1.Pearson correlation

2.Spearman correlation

2.Regression

1.Simple regression

2.Multiple regression

3.T-Test 1.One-sample T-test 2.Independent sample T-test 3.Paired sample T-test

Lesson Outline


Inferential Statistics

“Inferential statistics are used to make inferences (conclusions) about a population from a sample based on the statistical relationships or differences between two or more variables using statistical tests with the assumption that sampling is random in order to generalize or make predictions about the future.


Inferential Statistics

Inferential statistics are used

To test some hypothesis either to check relationship between variables (two/more) or to compare two groups to measure the differences among them.

To generalize the results about a population from a sample

To make predictions about the future.

To make conclusions


Some basics about inferential statistics!

Statistical significance (The p value)

Statistical significance test is the test of a null hypothesis Ho which is a hypothesis that we attempt to reject or nullify. i.e.

Ho =There is no relationship /Difference between variable 1 and variable 2

p value > 0.05 Ho is accepted and H1 is rejected.

p value < 0.05 Ho is rejected and H1 is accepted.


99

• Confidence Interval Confidence interval is a range of values constructed for a variable of interest so that this range has a specified probability of including the true value of the variable. The specified probability is called the confidence level, and the end points of the confidence interval are called the confidence limits’. It is one of the alternatives to null hypothesis significance testing

(NHST).


100

•The effect size (weak, moderate or strong) Effect size is the strength of the relationship between the independent variable and the dependent variable, and/or the magnitude of the difference between levels of the independent variable with respect to the dependent variable.

0 No effect No relationship

>0 – 0.33 Small effect Weak relationship

>0.33 – 0.70 Medium/typical effect Moderate relationship

>0.70 – <1 Large effect Strong relationship

1 Maximum effect Perfect relationship


101

1. Relate why a test is applied

2. Discuss for which variable the test is applied

3. Elaborate whether the null hypothesis is rejected

or accepted p value

4. State what is the direction of the effect

5. Conclude the results

Steps in interpreting inferential statistics


102

Inferential statistics include a wide variety of tests to infer

the results. This variety of tests can be classified in two

broader categories that are

1. Non parametric tests

2. Parametric tests

Types of test used in Inferential Statistics


103

Non parametric tests are the statistical tests that are

used • When the level of measurement is nominal or ordinal. E.g. chi-square test or

Kendall’s tau-b.

• When assumptions about normal distribution in the population is not met

e.g. spearman correlation

http://www.cliffsnotes.com/WileyCDA/Section/Statistics-Glossary.id-305499,articleId-

30041.html#ixzz0c38lKKZC retrieval data: 07/01/10

Non parametric tests involve • Chi-Square test

• Kendall’s tau-b

• Spearman correlation (will be discussed in correlation section)


http://www.cliffsnotes.com/WileyCDA/Section/Statistics-Glossary.id-305499,articleId-30041.html


















Non parametric test

Chi-Square Statistics

Chi-Squared test is the most commonly used non-parametric test to check the association between two nominal variables in order to accept or reject the null hypothesis. It is used to check

The association between two nominal variables

Hypothesis for Chi-Square Test Ho = there is no association between gender and geometry in h.s.

H1 = There is association between gender and geometry in h.s.


Chi-Squared Test

Assumptions and Conditions for the Chi-Squared test

The data of the variables must be independent.

Both the variables should be nominal.

All the expected counts are greater than 1 for chi-square.

At least 80% of the expected frequencies should be greater than or equal to 5.


geometry in h.s. * gender Crosstabulation

gender

Total male female

geometry in h.s. not taken Count 10 29 39

Expected Count 17.7 21.3 39.0

% of Total 13.3% 38.7% 52.0%

Taken Count 24 12 36


% of Total 32.0% 16.0% 48.0%

Total Count 34 41 75


% of Total 45.3% 54.7% 100.0%

Chi-Squared Test

Checking Assumptions and Conditions for the Chi-Squared test


Non parametric test

Cases

Valid Missing Total

N Percent N Percent N Percent geometry in h.s. * gender 75 100.0% 0 .0% 75 100.0%

Case Processing Summary

Value df

Asymp. Sig. (2-

sided) Exact Sig. (2-sided) Exact Sig. (1-sided)

Pearson Chi-Square 12.714a 1 .000

Continuity Correctionb 11.112 1 .001

Likelihood Ratio 13.086 1 .000

Fisher's Exact Test .000 .000

Linear-by-Linear Association 12.544 1 .000

N of Valid Casesb 75

Chi-Square Tests


Value Approx. Sig. Nominal by Nominal Phi

-.412 .000

Cramer's V .412 .000

N of Valid Cases 75

Symmetric Measures


109

Interpretation: To check the association between gender and geometry in h.s. chi-square test is conducted. The

case processing summary table indicates that there is no participant with missing value. The

assumptions are checked through crosstabs. The Crosstabulation table includes the Counts and

Expected Counts, and their relative percentages within gender. The result shows that there are 24

males who had taken geometry which is 71% of total 34 male students. On the other hand, 12 of 41

females took geometry; that is only 29% of the females. It looks like a higher percentage of males

took geometry than female students. The Ch-Square Test table tell us whether we can be confident

that this apparent difference is not due to chance.

Note, in the Cross Tabulation table, that the Expected Count of the number of male students who

didn’t take geometry is 17.7 and the observed or actual Count is 10. Thus, there are 7.7 fewer

males who didn’t take geometry than would be expected by chance, given the Totals shown in the

Table. There are also the same discrepancies between observed and expected counts in the other

three cells of the table. A question answered by the chi-square test is whether these discrepancies

between observed and expected counts are bigger than one might expect by chance.

The Chi-Square Tests table is used to determine if there is a statistically significant relationship

between two dichotomous or nominal variables. It tells you whether the relationship is statistically

significant but does not indicate the strength of the relationship, like phi or a correlation does. In

output, we use the Pearson Chi-Square or (for small samples) the Fisher’s exact test to interpret

the results of the test. They are statistically significant (p < .001), which indicates that we can be

quite certain that males and females are different on whether they take geometry.

Phi is -.412, and like the chi-square, it is statistically significant. Phi is also a measure of effect size

for an associational statistic and, in this case, effect size is moderate according to Cohen (1988)


Other Nonparametric Associational Statistics

KENDALL’S TAU-B If the variables are ordered (i.e. ordinal), you have several other choices. We will use Kendall’s tau-b in this problem.

Example: What is the relationship or association between father’s education and mother’s education?


Other Nonparametric Associational Statistics Case Processing Summary

Cases

Valid Missing Total

N Percent N Percent N Percent

mother education revised * father

education revised 73 97.3% 2 2.7% 75 100.0%

mother education revised * father education revised Cross tabulation

father education revised

Total 1 2 3

mother education revised 1 Count 43 8 2 53

Expected Count 35.6 13.1 4.4 53.0

% of Total 58.9% 11.0% 2.7% 72.6%

2 Count 6 10 2 18

Expected Count 12.1 4.4 1.5 18.0

% of Total 8.2% 13.7% 2.7% 24.7%

3 Count 0 0 2 2

Expected Count 1.3 .5 .2 2.0

% of Total .0% .0% 2.7% 2.7%

Total Count 49 18 6 73

Expected Count 49.0 18.0 6.0 73.0

% of Total 67.1% 24.7% 8.2% 100.0%



Symmetric Measures

Value Asymp. Std. Errora Approx. Tb Approx. Sig.

Ordinal by Ordinal Kendall's tau-b .494 .108 3.846 .000

N of Valid Cases 73

a. Not assuming the null hypothesis.

b. Using the asymptotic standard error assuming the null hypothesis.


113

Interpretation:

To investigate the relationship between father’s education

and mother’s education, Kendall’s tau-b was used. The

analysis indicated a significant positive association between

father’s education and mother’s education, tau =.572,

p<.001. This means that more highly educated fathers were

married to more highly educated mothers and less educated

fathers were married to less educated mothers. This tau is

considered to be a large effect size (Cohen, 1988).




115

Interpretation

Eta was used to investigate the strength of the association between gender and number of math courses taken (eta=.33). This is a weak to medium effect size (Cohen, 1988). Males were more likely to take several or all the math courses than females.


116



Lecture#8 Quantitative Technique


Descriptive Statistics



120

After studying this session you would be able to

• Understand and infer results from data in order to answer the associational and differential research questions using different parametric and non parametric tests.

• understand implement and interpret the chi-square, phi and cramer’s V

• understand, implement and interpret the correlation statistics

• understand, implement and interpret the regression statistics

• understand, implement and interpret the T-test statistics

Lesson Objectives


121

1.Non parametric test. 1.Chi square /Fisher exact 2.Phi and cramer’s v

3.Kendall tau-b

4.Eta

2.Parametric test 1.Correlation

1.Pearson correlation

2.Spearman correlation

2.Regression

1.Simple regression

2.Multiple regression

3.T-Test 1.One-sample T-test 2.Independent sample T-test 3.Paired sample T-test

Lesson Outline


Correlation Correlation is a statistical process that determines the mutual (reciprocal)

relationship between two (or more) variables which are thought to be mutually related in a way that systematic changes in the value of one variable are accompanied by systematic changes in the other and vice versa.

It is used to determine The existence of mutual relationship that is defined by the significance (p)

value.

The direction of relationship that is defined by the sign (+,-) of the test value

The strength of relationship that is defined by the test value

Correlation Coefficient (r)

The correlation coefficient measures the strength of linear relationship between two or more numerical variables. The value of correlation coefficient can vary from -1.0 (a perfect negative correlation or association) through 0.0 (no correlation) to +1.0 (a perfect positive correlation). Note that +1 and -1 are equally high or strong


Correlation

Assumptions and conditions for Pearson

The two variables have a linear relationship.

Scores on one variable are normally distributed for each value of the other variable and vice versa.

Outliers (i.e. extreme scores) can have a big effect on the correlation.


Correlation Checking the assumptions for Pearson Correlation

The assumptions for correlation test are checked through normal curve (normality assumption) and the scatter plot (linearity assumption)

Statistics math

achievement test

scholastic aptitude test -

math N Valid 75 75

Missing 0 0 Skewness .044 .128 Std. Error of Skewness .277 .277


Correlation

Correlations math

achievement test

scholastic aptitude test

- math math achievement test Pearson

Correlation 1 .788**

Sig. (2-tailed) .000

N 75 75 scholastic aptitude test - math

Pearson Correlation .788** 1

Sig. (2-tailed) .000

N 75 75 **. Correlation is significant at the 0.01 level (2-tailed).

Interpretation

To investigate if there was a statistically significant association between Scholastic aptitude test and math achievement, a correlation was computed. Both the variables were approximately normal there is linear relationship between them hence fulfilling the assumptions for Pearson's correlation. Thus, the Pearson’s r is calculated, r= 0.79, p = .000 relating that there is highly significant relationship between the variables. The positive sign of the Pearson's test value shows that there is positive relationship, which means that students who have high scores in math achievement test do have high scores in scholastic aptitude test and vice versa. Using Cohen’s (1988) guidelines’ the effect size is large relating

that there is strong relationship between math achievement and scholastic aptitude test. Prof.Muhammad Ilyas ,std.Muhammad Saeed 125

Correlation

Spearman Correlation: If the assumptions for Pearson correlation are not fulfilled then consider the Spearman correlation with the assumption that the Relationship between two variables is monotonically non linear

Example: what is the association between mother’s education and math achievement


Correlation Correlationsa

mother's education

math achieveme

nt test Spearman's rho mother's

education Correlation Coefficient 1.000 3.15**

Sig. (2-tailed) . .006 math achievement test

Correlation Coefficient .315** 1.000

Sig. (2-tailed) .006 . **. Correlation is significant at the 0.01 level (2-tailed).

Interpretation

To investigate if there was a statistically significant association between mother’s education and math achievement, a correlation was computed. Mother’s education was skewed (skewness=1.13), which violated the assumption of normality. Thus, the spearman rho statistic was calculated, r, (73) = .32, p = .006. The direction of the correlation was positive, which means that students who have highly educated mothers tend to have higher math achievement test scores and vice versa. Using Cohen’s (1988) guidelines’ the effect size is medium for studies in his area. The r2 indicates that approximately 10% of the variance in

math achievement test score can be predicted from mother’s education. Prof.Muhammad Ilyas ,std.Muhammad Saeed 127

REGRESSION ANALYSIS Regression analysis is used to measure the relationship between two or

more variables. One variable is called dependent (response, or outcome) variable and the other is called Independent (explanatory or predictor) variables.

Regression Equation

Y = a + bx Y = a + bx1 + cx2 + dx3 + ex4

Y = dependent variable

a = Constant

b, c, d, e, = slope coefficients

x1, x2, x3, x4 = Independent variables

Types of regression analysis Simple Regression

Multiple regression


REGRESSION ANALYSIS

Simple Regression

Simple regression is used to check the contribution of independent variable(s) in the dependent variable if the independent variable is one.

Assumptions and conditions of simple regression

Dependent variable should be scale

The relationship of variables should be liner

Data should be independent

Example: Can we predict math achievement from grades in high school


Commands Analyze Regression Linear

REGRESSION ANALYSIS


Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t Sig. B Std. Error Beta 1 (Constant) .397 2.530 .157 .876

grades in h.s. 2.142 .430 .504 4.987 .000

a. Dependent Variable: math achievement test

REGRESSION ANALYSIS

Interpretation

Simple regression was conducted to investigate how well grades in highschool predict math achievement scores. The results were statistically significant F (1, 73 ) = 24.87, p<.001. The indentified equation to understand this relationship was math achievement = .40 + 2.14* (grades in high school). The adjusted R2 value was .244. This indicates that 24% of the variance in math achievement was explained by the grades in high school. According to Cohen (1988), this is a large effect.

Regression equation is Y = 0.40 + 2.14X


Multiple Regression

Multiple regressions is used to check the contribution of independent variable(s) in the dependent variable if the independent variables are more than one.

Assumptions and conditions of Multiple regression Dependent variables should be scale.

Example: How well can you predict math achievement from a combination of four variables: grades in high school, father’s education, mother education and gender

REGRESSION ANALYSIS


Commands

Analyze Regression Linear

REGRESSION ANALYSIS


Model



T Sig. B Std. Error Beta 1 (Constant) 1.047 2.526 .415 .680

grades in h.s. 1.946 .427 .465 4.560 .000 father's education .191 .313 .083 .610 .544 mother's education .406 .375 .141 1.084 .282 gender -3.759 1.321 -.290 -2.846 .006

Coefficient

Interpretation Simultaneously multiple regression was conducted to investigate the best predictors of math achievement test scores. The means, standard deviation, and inter correlations can be found in table. The combination of variables to predict math achievement from grades in high school, father’s education, mother’s education and gender was statistically significant, F = 10.40, p <0.05. The beta coefficients are presented in last table. Note that high grades and male gender significantly predict math achievement when all four variables are included. The adjusted R2 value was 0.343. This indicates that 34 % of the variance in math achievement was explained by the model according

to Cohen (1988), this is a large effect. Prof.Muhammad Ilyas ,std.Muhammad Saeed 134

REGRESSION ANALYSIS Regression analysis is used to measure the relationship between two or

more variables. One variable is called dependent (response, or outcome) variable and the other is called Independent (explanatory or predictor) variables.

Regression Equation

Y = a + bx Y = a + bx1 + cx2 + dx3 + ex4

Y = dependent variable

a = Constant

b, c, d, e, = slope coefficients

x1, x2, x3, x4 = Independent variables

Types of regression analysis Simple Regression

Multiple regression


REGRESSION ANALYSIS

Simple Regression

Simple regression is used to check the contribution of independent variable(s) in the dependent variable if the independent variable is one.

Assumptions and conditions of simple regression


The relationship of variables should be linear

Data should be independent

Example: Can we predict math achievement from grades in high school


Commands Analyze Regression Linear

REGRESSION ANALYSIS


Coefficientsa

Model



t Sig. B Std. Error Beta 1 (Constant) .397 2.530 .157 .876

grades in h.s. 2.142 .430 .504 4.987 .000

a. Dependent Variable: math achievement test

REGRESSION ANALYSIS

Interpretation

Simple regression was conducted to investigate how well grades in high school predict math achievement scores. The results were statistically significant F (1, 73 ) = 24.87, p<.001. The indentified equation to understand this relationship was math achievement = .40 + 2.14* (grades in high school). The adjusted R2 value was .244. This indicates that 24% of the variance in math achievement was explained by the grades in high school. According to Cohen (1988), this is a large effect.

Regression equation is Y = 0.40 + 2.14X


Multiple Regression

Multiple regressions is used to check the contribution of independent variable(s) in the dependent variable if the independent variables are more than one.

Assumptions and conditions of Multiple regression Dependent variables should be scale.

Example: How well can you predict math achievement from a combination of four variables: grades in high school, father’s education, mother education and gender

REGRESSION ANALYSIS


Commands

Analyze Regression Linear

REGRESSION ANALYSIS


Model



T Sig. B Std. Error Beta 1 (Constant) 1.047 2.526 .415 .680

grades in h.s. 1.946 .427 .465 4.560 .000 father's education .191 .313 .083 .610 .544 mother's education .406 .375 .141 1.084 .282 gender -3.759 1.321 -.290 -2.846 .006

Coefficient

Interpretation Simultaneously multiple regression was conducted to investigate the best predictors of math achievement test scores. The means, standard deviation, and inter correlations can be found in table. The combination of variables to predict math achievement from grades in high school, father’s education, mother’s education and gender was statistically significant, F = 10.40, p <0.05. The beta coefficients are presented in last table. Note that high grades and male gender significantly predict math achievement when all four variables are included. The adjusted R2 value was 0.343. This indicates that 34 % of the variance in math achievement was explained by the model according

to Cohen (1988), this is a large effect.


142



Last lecture #11



T-TEST Statistics

The t test is used to compare to groups to answer the differential research questions. Its values determines the difference by comparing means

Hypothesis for T-test

HO: there is no Difference

H1: There is difference

Types of T-test

There are three types of T-test

One sample t-test

Independent sample t-test

Paired sample t-test


T-TEST Statistics

One sample t-test

One sample t-test is used to determine if there is difference between

population mean (Test value) and the sample mean (X)

Assumptions and conditions of 1 sample t-test The dependent variable should be normally distributed within the

population

The data are independent.(scores of one participant are not depend on scores of the other :participant are independent of one another )

Example: is the mean SAT-Math score in the modified HSB data set significantly different from the presumed population mean of 500?


T-TEST Statistics

One-Sample Statistics

N Mean Std. Deviation Std. Error Mean scholastic aptitude test - math 75 490.53 94.553 10.918

One-Sample Test

Test Value = 500

t Df Sig. (2-tailed)

Mean Difference

95% Confidence Interval of the Difference

Lower Upper scholastic aptitude test - math -.867 74 .389 -9.467 -31.22 12.29


148

Interpretation: To investigate the difference between population and the sample, one-sample

t-test is conducted. The One-Sample Statistics table provides basic

descriptive statistics for the variable under consideration. The Mean AT-Math

for the students in the sample will be compared to the hypothesize population

mean, displayed as the Test Value in the One-Sample Test table. On the

bottom line of this table are the t value, df, and the two-tailed sig. (p) value,

which are circled. Note that p=.389 so we can say that the sample mean

(490.53) is not significantly different from the population mean of 500. The

table also provides the difference (-9.47) between the sample and population

mean and the 95% Confidence Interval. The difference between the sample

and the population mean is likely to be between +12.29 and -31.22 points.

Notice that this range includes the value of zero, so it is possible that there is

no difference. Thus, the difference is not statistically significant.


T-TEST Statistics

Independent sample t-test

Independent sample T-test is used to compare two independent groups (Male and Female)with respect to there effect on same dependent variable.

Assumptions and conditions of Independent T-test Variance of the dependent variable for two categories of the

independent variable should be equal to each other


Data on dependent variable should be independent.

Example: Do male and female students differ significantly in regard to their average math achievement scores


T-TEST Statistics

The first table, Group Statistics, shows descriptive statistics for the two groups (males and females) separately. Note that the means within each of the three pairs look somewhat different. This might be due to chance, so we will check the t test in the next table. The second table, Independent Sample Test, provides two statistical tests. The left two columns of numbers are the Levene’s test for the assumption that the variances of the two groups are equal. This is not the t test; it only assesses an assumption! If this F test is not significant (as in the case of math achievement and grades in high school), the assumption is not violated, and one uses the Equal variances assumed line for the t test and related statistics. However, if Levene’s F is statistically significant (Sig. <.05), as is true for visualization, then variances are significantly different and the assumption of equal variances is violated. In that case, the Equal variances not assumed line used; and SSPS adjusts t, df, and Sig. The appropriate lines are circled.


151

Thus, for visualization, the appropriate t=2.39, degree of freedom (df) = 57.15, p=.020. This t is statistically significant so, based on examining the means, we can say that boys have higher visualization scores than girls. We used visualization to provide an example where the assumption of equal variances was violated (Levene’s test was significant). Note that for grades in high school, the t is not statistically significant (p=.369) so we conclude that there is no evidence of a systematic difference between boys and girls on grades. On the other hand, math achievement is statistically significant because p<.05; males have higher means. The 95% Confidence Interval of the Difference is shown in the two right-hand column of the output. The confidence interval tells us if we repeated the study 100 times, 95 of the times the true (population) difference would fall within the confidence interval, which for math achievement is between 1.05 points and 6.97 points. Note that if the Upper and Lower bounds have the same sign (either + and + or – and -), we know that the difference is statistically significant because this means that the null finding of zero difference lies outside of the confident interval. On the other hand, if zero lies between the upper or lower limits, there could be no difference, as is the case of grades in h.s. The lower limit of the confidence interval on math achievement tells us that the difference between males and females could be as small as 1.05 points out 25, which are the maximum possible scores. Effects size measures for t tests are not provided in the printout but can be estimated relatively easily. For math achievement, the difference between the means (4.01) would be divided by about 6.4, an estimate of the pooled (weighted average) standard deviation. Thus, d would be approximately .60, which is, according to Cohen (1988), a medium to large sized “effect.” Because you need means and standard deviations to compute the effect size, you should include a table with means and standard deviations in your results section for a full interpretation of t tests.


T-TEST Statistics

Paired sample t-test Paired sample T-test is used to compare two paired groups (e.g. Mothers and fathers) with respect to there effect on same dependent variable.

Assumptions and conditions of Paired sample T-test

The independent variable is dichotomous and its levels (or groups) are paired, or matched, in some way (husband-wife, pre-post etc)

The dependent variable is normally distributed in the two conditions

Example: Do students’ fathers or mothers have more education?


153

Paired Samples Statistics

Mean N Std. Deviation Std. Error Mean

Pair 1 father's education 4.73 73 2.830 .331

mother's education 4.14 73 2.263 .265

Paired Samples Correlations

N Correlation Sig.

Pair 1 father's education & mother's education

73 .681 .000


154

The first table shows the descriptive statistics used to compare mother’s and father’s education levels. The second table Paired Samples Correlations, provides correlations between the two paired scores. The correlation (r=.68) between mother’s and father’s education indicates that highly educate men tend to marry highly educated women and vice versa. It doesn’t tell you whether men or women have more education. That is what t in the third table tells you. The last table shows the Paired Samples t Test. The Sig. for the comparison of the average education level of the students’ mothers and fathers was p=.019. Thus, the difference in educational level is statistically significant, and we can tell from the means in the first table that fathers have more education; however, the effect size is small (d=.28), which is computed by dividing the mean of the paired differences (.59) by the standard deviation (2.1) of the paired differences. Also, we can tell from the confidence interval that the difference in the means could be as small as .10 of a point or as large as 1.08 points on the 2 to 10 scale.


Thank you!


Documents

Quantative Techniqes