Upload
saeed-meo
View
193
Download
10
Tags:
Embed Size (px)
DESCRIPTION
Quantative TechniqesMuhammad Saeed Aas MeoSuperior University Lahore Paksitan
Citation preview
1
Quantitative Techniques in Business (QTB)
Quantitative Techniques are the techniques used to gather, sort, analyze and interpret numerical data in order to improve business decisions.
Numerical data Numerical data (or quantitative data) is data measured or identified on a
numerical scale. Numerical data can be analysed using statistical methods, and results can be displayed using tables, charts, histograms and graphs
Examples
Company sales (millions) 18, 12, 20, etc
Number of employees in company(hundreds) 15, 8, 5, etc
Prof.Muhammad Ilyas ,std.Muhammad Saeed 2
Why Study QTB
Studying QTB is essential as it enables to
Gather, sort, analyze and interpret the data
Have needed, timely, accurate, yet relevant information.
Understand and compare different types of situations
Predict and forecast about the future needs of the business
Develop effective policies and business related strategies
Make effective decisions to achieve business goals efficiently
Research is based on QTB
Final thesis is based on QTB
Prof.Muhammad Ilyas ,std.Muhammad Saeed 3
Research Problem
“Any problem or opportunity that needs to be addressed through research process of data collection and analysis is called
Research Problem”
Examples
Human Resource manager wants to develop HR policies regarding
employees turnover in order to reduce it.
Marketing manager wants to launch a new product successfully using advertisement as promotional tool
Finance manager needs to invest excessive money profitably
Prof.Muhammad Ilyas ,std.Muhammad Saeed 4
Problem Statement A problem statement is a clear and concise description
of any business issue that seeks for Description, Association or difference of two or more variables.
Example Measure the annual turnover of employees in Higher educational
sector of Pakistan
Does advertisement contribute to the sales of a new product in the market
Which of the two options i.e. stock market or real estate is better for investment.
Prof.Muhammad Ilyas ,std.Muhammad Saeed 5
What is Variable? Vary + able = Change + able
Variable is a characteristic of anything that can vary (Change).
Examples
Gender (Male, Female)
Age (20 years, 30 years, 50 years)
Motivation level (High, Medium, Low)
Constant is a characteristic that do not vary e.g. If all students are male in a class then Gender will be constant
Prof.Muhammad Ilyas ,std.Muhammad Saeed 6
Advertisement Sales
Types of Variable with respect to relation
Awareness
Competitors product, price, packaging, placement
Budget
Prof.Muhammad Ilyas ,std.Muhammad Saeed 7
Variable
Categorical
Nominal Ordinal
Numerical
Discrete Continuous
Gender 1. Male
2. Female
Motivation 1. Highly Motivated
2. Moderately
Motivated
3. Less Motivated
1. No of students
2. No of chairs
3. Collar size
1. Height
2. Weight
3. speed
Types of Variable with respect to data
Prof.Muhammad Ilyas ,std.Muhammad
Saeed 8
Categorical Variable A variable whose values are not numerical in nature
Variables Values
Gender Male, female
Religion Islam, christianity, Jews, etc
Motivation level High, medium, low
Types of Categorical variable
1. Nominal variable
A categorical variable whose values are not ordered
Example
Gender Male, Female
2. Ordinal variable
A categorical variable whose values are in ordered
Example
Education Metric, inter, graduation
Prof.Muhammad Ilyas ,std.Muhammad Saeed 9
Numerical Variable A variable whose values are numerical in nature
Variables Values
Collar size 14, 14.5, 15, 15.5……….
Height 5.7, 5.8, 5.3
No of employees 23, 45, 69, 100
Types of Numerical variable
1. Discrete variable
A numerical variable whose values have same interval
Example
collar size 14.5, 15, 15.5……..
2. continuous variable
A numerical variable whose values don’t have same interval
Example
speed 40.1, 45, 67……….
Prof.Muhammad Ilyas ,std.Muhammad Saeed 10
Research Question Research problem needs to be translated into one or
more research questions that are defined as
“A research question is an interrogative statement that seeks for the tentative relationship among variables and clarifies what the researcher wants to answer.”
Example What is the impact of advertisement on sales of a new product in
the market
What is the annual turnover of employees in Higher educational institutions of Pakistan
Does investing in stock market yield more return on investment as compare to investment in real estate.
Prof.Muhammad Ilyas ,std.Muhammad Saeed 11
Type of Research Question Descriptive: A question that is answered through Summarising
data about a single variable E.g.: What is the annual turnover of employees in Higher
educational institutions of Pakistan
Associational: A question that is answered through determining strength and direction of relationship between two or more variables E.g.: What is the impact of advertisement on sales of a new product in
the market
Difference: A question that is answered through comparing and contrasting two groups or variables E.g.: Does investing in stock market yield more return on investment as
compare to investment in real estate.
Prof.Muhammad Ilyas ,std.Muhammad Saeed 12
Research Hypotheses “Research hypotheses are predictive statements about
the relationship between two variables”
Types of Hypothesis
There are two types of hypothesis
1. Null Hypothesis Ho = There is no relationship between Advertising and Sales
2. Alternative Hypothesis H1 = There is relationship between advertising and sales
Prof.Muhammad Ilyas ,std.Muhammad Saeed 13
Research Question Vs. Hypothesis
Research question Hypothesis
o Interrogative statement Simple statement
oNon-Predictive Predictive
oNon-Directional Directional
Prof.Muhammad Ilyas ,std.Muhammad Saeed 14
Activity
In groups of four, use the variables provided to write:
An associational question
A difference question
A descriptive question
Prof.Muhammad Ilyas ,std.Muhammad Saeed 15
Data Set of raw facts figures is called Data
Example: Age- 16, 18, 20, 21, 23,
Nationality- Pakistani, Indian, American
Types of Data
Data
Nature
Qualitative Quantitative
Time frame
Cross-sectional
Time-Series
Prof.Muhammad Ilyas ,std.Muhammad Saeed 16
Thank You!
Prof.Muhammad Ilyas ,std.Muhammad Saeed 17
Lecture #2 I am really thankful to my gorgeous teachers Sir Dr.Muhammad Ilyas , for that great knowledge.
Prof.Muhammad Ilyas ,std.Muhammad Saeed 18
Primary and Secondary Data
Prof.Muhammad Ilyas ,std.Muhammad Saeed 19
DATA A set of raw facts and figures are called data
OR
Example: Age 16, 18, 20, 21, 23
Nationality Pakistani, Indian,
American
Prof.Muhammad Ilyas ,std.Muhammad Saeed 20
TYPES OF DATA Data
Nature
Quantitative Data
Qualitative Data
Time Frame
Cross Sectional
Data
Time Series Data
Longitudinal Data
Source
Primary Data
Secondary Data
Prof.Muhammad Ilyas ,std.Muhammad Saeed 21
TYPES OF DATA On the basis of Nature
Nature wise data can be of two types:
1) Quantitative Data: A data that consist of numbers for example data about age consists of values like 16, 18, 20, 21, 23
2) Qualitative Data: A data that consists of words rather than numbers. For example nationality includes Pakistani, Indian, American
Prof.Muhammad Ilyas ,std.Muhammad Saeed 22
TYPES OF DATA Cont… On the basis of Time Frame:
Time wise data can be categorized into two
1) Cross-Sectional Data: Data that is collected from different units at once
2) Time Series Data: Data that is collected from same units on different time with same time interval
3) Longitudinal Data: A dataset is longitudinal if it tracks the same type of information on the same subjects at multiple points in time.
Prof.Muhammad Ilyas ,std.Muhammad Saeed 23
SOURCES OF DATA Primary Data Source: Primary data is such data
which comes from an original source and are collected with a specific research question in mind
For example: You want to collect data on Employee Motivation
Secondary Data Source: Secondary data represents the previously recorded data collected for another purpose.
For example: You want to collect the data on profit of MCB Bank for 5 years
Prof.Muhammad Ilyas ,std.Muhammad Saeed 24
Survey method is used to collect primary data
WHAT IS A SURVEY?
Survey is a quantitative research strategy that involves
the structured collection of data from a pre-
determined sample.
HOW TO COLLECT PRIMARY DATA
Prof.Muhammad Ilyas ,std.Muhammad Saeed 25
Survey Design 1. Objectives of Survey
2. Survey Design
3. Pilot Test
4. Field work/Data Collection
5. Data Preparation
6. Data Analysis and Interpretation
7. Discussion and Conclusion
8. Report Writing
Prof.Muhammad Ilyas ,std.Muhammad Saeed 26
The first step of survey design is to clearly define that
why we are going to conduct the survey.
Example: The basic aim of survey is to collect updated, accurate yet
relevant data in order to answer a research problem
1: Objectives of Survey
Prof.Muhammad Ilyas ,std.Muhammad Saeed 27
After setting objectives of survey we develop the plan
(design) of survey deciding that:
Whom to survey (Sample Selection)
Where to survey (Site Selection)
How to survey (Method)
What to survey (Questions for required information)
2: Survey Design
Prof.Muhammad Ilyas ,std.Muhammad Saeed 28
How to develop Questionnaire? 1. Decide what information is required.
2. Draft some questions on each variable to elicit the information
3. Put them into a meaningful order and format
4. Pre-test the questionnaire
5. Go back to Step 1, and continue until the questionnaire is perfect.
Prof.Muhammad Ilyas ,std.Muhammad Saeed 29
It is process of checking/assessing the accuracy of the
wording sequence and ability to understand the
question by conducting survey from one or two
respondent as a trail in order to refine questionnaire.
3: Pilot Test
Prof.Muhammad Ilyas ,std.Muhammad Saeed 30
It is a process of collecting data actually from the
target sample. It can be done in following ways:
Self administered survey
Postal survey
Online survey
4: Fieldwork/conduct a survey
Prof.Muhammad Ilyas ,std.Muhammad Saeed 31
After getting your survey completed and knowing the interface
of the SPSS the next step is to prepare the data for analysis.
This process involves four steps.
1. Coding the questionnaire.
2. Defining the variables in SPSS variable view.
3. Entering the data in SPSS data view.
5: Data Preparation
Prof.Muhammad Ilyas ,std.Muhammad Saeed 32
It is a process of summarizing, organizing and transforming data with
the goal of highlight the useful information, suggesting conclusions in
order to answer the research question and support good decision
making.
Data can be analyzed in two ways:
Descriptive Analysis
Inferential Analysis
6: Data Analysis and Interpretation
Prof.Muhammad Ilyas ,std.Muhammad Saeed 33
Interpretation is a process of making sense of results
by explaining and assigning meaning to them.
Interpretation
Prof.Muhammad Ilyas ,std.Muhammad Saeed 34
Clarity of thoughts
Complete and self explanatory
Comprehensive and compact
Accurate in all aspects
Support facts
Suitable format for readers
Proper date and signature
reference
Reliable sources
Logical manner
Report writing
Prof.Muhammad Ilyas ,std.Muhammad Saeed 35
1. Discussion:( same result like previous result or change discussion phase like discussion with proper background
2. Conclusion: (what is our result of your study and what we achieve)
7: Discussion and Conclusion
Prof.Muhammad Ilyas ,std.Muhammad Saeed 36
8: Report Writing
Prof.Muhammad Ilyas ,std.Muhammad Saeed 37
Sources that are used to collect secondary data can be:
Documentary
Government survey
Academic survey
Company’s financial statements
Bank reports
HOW TO COLLECT SECONDARY DATA
Prof.Muhammad Ilyas ,std.Muhammad Saeed 38
www.wdi.com
www.pwt.com
www.ifs.com
www.fbs.com
www.sbp.com
Important links of Secondary Data
Prof.Muhammad Ilyas ,std.Muhammad Saeed 39
Step: 1 Search WDI
Prof.Muhammad Ilyas ,std.Muhammad Saeed 40
Step 2: Click Data Bank on the WDI Web Page
Prof.Muhammad Ilyas ,std.Muhammad Saeed 41
Step 3: Select your Desired Country for the Extraction of Data
Prof.Muhammad Ilyas ,std.Muhammad Saeed 42
Step 4: Select Required Variables from the list
Prof.Muhammad Ilyas ,std.Muhammad Saeed 43
Step 5: Select Years
Prof.Muhammad Ilyas ,std.Muhammad Saeed 44
Step 6: View Data
Prof.Muhammad Ilyas ,std.Muhammad Saeed 45
Step 7: Click on Excel to Export Data
Prof.Muhammad Ilyas ,std.Muhammad Saeed 46
Lecture #3
Prof.Muhammad Ilyas ,std.Muhammad Saeed 47
In the name of Allah Kareem, Most Beneficent, Most Gracious, the Most Merciful !
Prof.Muhammad Ilyas ,std.Muhammad Saeed 48
Prof.Muhammad Ilyas ,std.Muhammad Saeed 49
50
Before further processing of the data we should get to know about SPSS software first .
SPSS SPSS stands for “statistical package for social sciences”. It is basically used for the analysis of quantitative data .
How to open SPSS
Introduction to SPSS
Start Menu
Programs SPSS Inc. SPSS 16.0
Prof.Muhammad Ilyas ,std.Muhammad Saeed
Prof.Muhammad Ilyas ,std.Muhammad Saeed 51
52
Welcome window SPSS 16.0
Prof.Muhammad Ilyas ,std.Muhammad Saeed
53
SPSS Interface
Title Bar
Menu Bar
Tool Bar
Variable definition criteria
Serial Number / Cases
Work sheet
SPSS Views
Prof.Muhammad Ilyas ,std.Muhammad Saeed
54 Prof.Muhammad Ilyas ,std.Muhammad Saeed
55
After defining the variables enter the data in data view for each
case (row wise) against each variable (column wise)
Data Entry
Prof.Muhammad Ilyas ,std.Muhammad Saeed
56 Prof.Muhammad Ilyas ,std.Muhammad Saeed
57
After collecting the data, data processing is started that involves
1. Data coding 2. Defining the variables 3. Data entry in the software 4.Checking for error
Data Processing
Prof.Muhammad Ilyas ,std.Muhammad Saeed
58
Please circle or supply your answer ID_________
SD SA
1. I would recommend this course to other students 1 2 3 4 5
2. I worked very hard in this course 1 2 3 4 5
3. My college is : Arts & sciences___ _ Business____ Engineering____
4. My gender is M F
5. My GPA is _____________
6. For this class, I did: (Check all that apply
The reading
The homework
Extra credit
SAMPLE QUESTIONNAIRE
Prof.Muhammad Ilyas ,std.Muhammad Saeed
59
Coding is the process of assigning numbers to the values or
levels of each variable.
Rules of Coding
1. All data should be numeric.
2. Each variable for each case or participant must occupy the same
column.
3. All values (codes) for a variable must be mutually exclusive.
4. Each variable should be coded to give maximum information.
5. For each participant, there must be a code or value for each
variable.
6. Apply any coding rule consistently for all participants.
7. Use high numbers (values or codes) for the “agree,” “good,” or
“positive” end of a variable that is ordered
Coding
Prof.Muhammad Ilyas ,std.Muhammad Saeed
60
In SPSS first of all the variables are defined in variable view
This includes
1. Name of the variable (Short without space)
2. Type (Numeric, String)
3. Width (8, 10, etc)
4. Decimals (2, 3, 5 etc)
5. Label (Full name of the variable)
6. Values (answer categories with codes)
7. Missing (blank, multiple, wrong answers)
8. Columns (6, 8, 10 etc)
9. Align (Left, right, centre)
10. Measure (Nominal, ordinal, scale)
Defining the variables
Prof.Muhammad Ilyas ,std.Muhammad Saeed
61
SUPERIOR GROUP OF COLLEGES 61
Prof.Muhammad Ilyas ,std.Muhammad Saeed
Lecture 4
Prof.Muhammad Ilyas ,std.Muhammad Saeed 62
Data File Management
Prof.Muhammad Ilyas ,std.Muhammad Saeed 63
After this lecture you would: Learn four useful data transformation techniques:
Count
Recode (Revise and Reverse)
Compute a new variable
Prof.Muhammad Ilyas ,std.Muhammad Saeed 64
Problem 5.1: Count Math Courses Taken
How many math courses (algebra1, algebra2, geometry, trigonometry and calculus) did each of the 75
participants take in high school? Label your new variable
Prof.Muhammad Ilyas ,std.Muhammad Saeed 65
Problem 5.2: Recode and Relabel Mother’s and Father’s Education
Recode mother’s and father’s education so that those with no postsecondary education have a value of 1,
those with some posts secondary have a value of 2, and those with a bachelor’s degree or more have a value of
3. Label the new variables and values
Prof.Muhammad Ilyas ,std.Muhammad Saeed 66
Problem5.3: Recode and Compute Pleasure Scale Score
Compute the average pleasure scale from item02, item06, item10 and item14 after reversing (use the
Recode command) item06 and item10. Name the new computed variable pleasure and label its highest and
lowest values.
Prof.Muhammad Ilyas ,std.Muhammad Saeed 67
Lecture #5
Prof.Muhammad Ilyas ,std.Muhammad Saeed 68
Prof.Muhammad Ilyas ,std.Muhammad Saeed 69
70
After this session the students will be able to analyze the collected data using descriptive statistics by
• Producing summaries of data in both tabular and graphical forms
• Calculating the central tendencies using mean median and modes
• Calculate the dispersion of data using range, IQR and Standard Deviation
• Checking if the data is normally distributed using Normal curve phenomenon
Session Objectives
Prof.Muhammad Ilyas ,std.Muhammad Saeed
71 SUPERIOR GROUP OF COLLEGES
Analyzing Data
“The process of breaking down the complex
data to gain better understanding of it.”
There are two types of statistics
Descriptive statistics
Inferential statistics
In this session we will work on descriptive statistics
Prof.Muhammad Ilyas ,std.Muhammad Saeed
72 SUPERIOR GROUP OF COLLEGES
Descriptive statistics are used to Describe, Summarize,
Organize, and Simplify data in quantitative terms. We will
cover
1. Summarizing Numerical Data
2. Measures of Central Tendency
3. Measurement of Dispersion
4. Checking Data Normality
Descriptive statistics
Prof.Muhammad Ilyas ,std.Muhammad Saeed
73
1. Summarizing
Variable
Categorical
Frequency Distribution Table
Bar chart
Numerical
Five Figure Summary
Box Plot / Histograms
Prof.Muhammad Ilyas ,std.Muhammad Saeed
74
A frequency distribution is a tally or count of the number of times each score on a
single variable occurs
Frequency Distribution.
Summarizing categorical data
Analyze Descriptive Statistics Frequencies move religion to the variable box OK (make sure that the Display frequency tables box is checked)
Frequency table for religion religion
Frequency Percent Valid Percent Cumulative
Percent
Valid Muslims 30 40.0 44.8 44.8
Christians 23 30.7 34.3 79.1
Hindus 14 18.7 20.9 100.0
Total 67 89.3 100.0
Missing other religion 4 5.3
blank 4 5.3
Total 8 10.7
Total 75 100.0
Prof.Muhammad Ilyas ,std.Muhammad Saeed
75
Interpretation:
In this example, there is a Frequency column that shows the
numbers of students who marked each type of religion (e.g., 30
said Muslims, 23 Christians, 14 Hindus, 4 is missing and 4 left
it blank).Notice that there are a total of (67) for the three
responses considered Valid and a total (8) for the two types of
responses considered to be Missing as well as an overall total
(75). The Percent column indicates that 40.0% are Muslims ,
30.7% are Christians , 18.7% are Hindus, 5.3% had one of several
other religions, and 5.3% left the question blank. The Valid
Percentage column excludes the eight missing cases and is often
the column that you would use. Given this data set, it would be
accurate to say that of those not coded as missing, 44.8% were
Muslims and 34.3% Christians and 20.9% were Hindus.
Prof.Muhammad Ilyas ,std.Muhammad Saeed
76
With Nominal data, it is better to make a bar graph or chart of the frequency distribution of variables like religion, ethnic group, or other nominal variables; the points that happen to be adjacent in your frequency distribution are not by necessarily adjacent. To get a bar chart select
Graphs legacy dialogues interactive bar chart move variable to the box OK
Bar Charts Summarizing categorical data
Prof.Muhammad Ilyas ,std.Muhammad Saeed
77
Five Figure Summary
It is used to summarize the Numerical data. Five figures include
locating the following values in data
1. Minimum value
2. Maximum Value
3. Median
4. Lower Quartile 5. Upper Quartile
Summarizing Numerical data
Prof.Muhammad Ilyas ,std.Muhammad Saeed
78
Exercise: Calculate Five Figure Summary
2 1 3 2 1 4 3 5 8 8 7 7 4 5 6 2 6 6 6 6
Department B: 20 employees
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 1 2 2 2 3 3 4 4 5 5 6 6 6 6 6 7
18 19 20
7 8 8
Min = 1
Max = 8
Median = 5
Lower Quartile = 2.5
Upper Quartile = 6
Prof.Muhammad Ilyas ,std.Muhammad Saeed
79
Exercise: Calculate Five Figure Summary
Department B: 30 employees
2 1 3 6 3 8 14 5 16 5 6 6 7 7 7
7 9 8 4 6 8 7 8 3 8 10 6 8 18
12
Prof.Muhammad Ilyas ,std.Muhammad Saeed
80
box & whisker plot
For ordinal and normal data, the box and whiskers plot is useful The box and whisker
plot is a graphical representation of distribution of scores and is helpful in distinguishing
between ordinal and normally distributed data
Graphs legacy dialogues interactive box plot move gender to the x-axis and move SAT math to y-axis OK
Prof.Muhammad Ilyas ,std.Muhammad Saeed
81
Interpretation
The case processing summary table shows the valid N=75,
with no missing values for total sample of 75 for the variable
math achievement. The plot shows a box plot for math
achievement. The box represents the middle 50% of the
cases (M=13), lower end of the box shows lower quartile
(Q1=7.67), and upper end of the quartile shows upper
quartile (17.00). The whiskers indicate the expected range
(25.33) of scores from minimum (Min=-1.67) to Maximum
(Max=23.67). Scores outside of this range are considered
unusually high or low, such scores are called outliers. There
are no outliers for in this case.
Prof.Muhammad Ilyas ,std.Muhammad Saeed
82
Histogram Histograms are just like bar graph but there is no space between the boxes, indicating that there is a continuous variable theoretically underlying the scores. Histograms can be used even if data, as measured, are not continuous, if the underlying is conceptualized as continuous.
To draw a histogram select:
Graphs legacy dialogues interactive histogram move variable to the box OK
Prof.Muhammad Ilyas ,std.Muhammad Saeed
83
Interpretation In this frequencies (number of students), shown by
the bars are for a range of points (in this case SPSS
selected a range of 50: 250-299, 300-349, 350-399,
etc). Notice that the largest number of students
(about 20) had scores in the middle two bars of the
range (450-499 and 500-549). Similar small
numbers of students have very low and very high
scores. The bars in the histogram form a distribution
(pattern or curve) that is similar to the normal, bell
shaped curve. Thus, the frequency distribution of
the SAT math scores is said to be approximately
normal.
Prof.Muhammad Ilyas ,std.Muhammad Saeed
84
Mean. The arithmetic average or mean takes into account all of the available information in computing the central tendency of a frequency distribution.
Median. The middle score or median is the appropriate measure of central tendency for ordinal level raw data.
Mode. The most common category, or mode can be used with any kind of data generally provides the least precise information about central tendency
MEASUREMENT OF CENTRAL TENDENCY
Prof.Muhammad Ilyas ,std.Muhammad Saeed
85
Measure of Central Tendency
Exercise
Analyze Descriptive statistics Frequencies put SAT Math into variable box click on statistics mark mean, median and mode click continue Ok
Statistics
scholastic aptitude test - math
N Valid 75
Missing 0
Mean 490.53
Median 490.00
Mode 500 Prof.Muhammad Ilyas ,std.Muhammad Saeed
86
Measures of Variability
Range—The range (highest minus lowest score) is the crudest measure of variability but does give an indication of the spread in scores if they are ordered.
Inter quartile range (IQR)-
IQR=Q3-Q1
Standard Deviation—The standard deviation is based on the deviation (x) of each score from the mean of all scores.
Prof.Muhammad Ilyas ,std.Muhammad Saeed
87
Analyze Descriptive statistics Frequencies put SAT Math into variable box click on statistics mark Range and std deviation click continue Ok
Prof.Muhammad Ilyas ,std.Muhammad Saeed
88
Descriptive Statistics
The Normal Curve The frequency distributions of many of the variables used in the behavioral
sciences are distributed approximately as a normal curve when N is large.
Properties of Normal Curve
1. The mean, median and mode are equal.
2. It has one “hump” and this hump is in the middle of the distribution.
3. The curve is symmetric. If you fold the normal curve in half, the right side would
fit perfectly with the left side; that is, it is not skewed.
4. The range is infinite.
5. The curve is neither too peaked nor too flat and its tails are neither too short nor
too long.
Prof.Muhammad Ilyas ,std.Muhammad Saeed
89
Nominal Dichotomous Ordinal Normal
Frequency Distribution Yes Yes Yes Ok
Bar Chart Yes Yes Yes OK
Histogram No No OK Yes
Frequency Polygon No No OK Yes
Box &Whisker Plot No No Yes Yes
Central Tendency
Mean No OK OK Yes
Median No OK Yes OK
Mode Yes Yes OK OK
Variability
Range No Always 1 Yes Yes
Standard Deviation No No OK Yes
Interquartile Range No No OK OK
How many categories Yes Always 2 OK No
Shape
Skewness No No Yes Yes
Prof.Muhammad Ilyas ,std.Muhammad Saeed
90
SUPERIOR GROUP OF COLLEGES 90
Prof.Muhammad Ilyas ,std.Muhammad Saeed
Lecture #6
Prof.Muhammad Ilyas ,std.Muhammad Saeed 91
In the name of Allah Kareem, Most Beneficent, Most Gracious, the Most Merciful !
Prof.Muhammad Ilyas ,std.Muhammad Saeed 92
Prof.Muhammad Ilyas ,std.Muhammad Saeed 93
94
After studying this session you would be able to
• Understand and infer results from data in order to answer the associational and differential research questions using different parametric and non parametric tests.
• understand implement and interpret the chi-square, phi and cramer’s V
• understand, implement and interpret the correlation statistics
• understand, implement and interpret the regression statistics
• understand, implement and interpret the T-test statistics
Lesson Objectives
Prof.Muhammad Ilyas ,std.Muhammad Saeed
95
1.Non parametric test. 1.Chi square /Fisher exact 2.Phi and cramer’s v
3.Kendall tau-b
2.Parametric test 1.Correlation
1.Pearson correlation
2.Spearman correlation
2.Regression
1.Simple regression
2.Multiple regression
3.T-Test 1.One-sample T-test 2.Independent sample T-test 3.Paired sample T-test
Lesson Outline
Prof.Muhammad Ilyas ,std.Muhammad Saeed
Inferential Statistics
“Inferential statistics are used to make inferences (conclusions) about a population from a sample based on the statistical relationships or differences between two or more variables using statistical tests with the assumption that sampling is random in order to generalize or make predictions about the future.
Prof.Muhammad Ilyas ,std.Muhammad Saeed 96
Inferential Statistics
Inferential statistics are used
To test some hypothesis either to check relationship between variables (two/more) or to compare two groups to measure the differences among them.
To generalize the results about a population from a sample
To make predictions about the future.
To make conclusions
Prof.Muhammad Ilyas ,std.Muhammad Saeed 97
Some basics about inferential statistics!
Statistical significance (The p value)
Statistical significance test is the test of a null hypothesis Ho which is a hypothesis that we attempt to reject or nullify. i.e.
Ho =There is no relationship /Difference between variable 1 and variable 2
p value > 0.05 Ho is accepted and H1 is rejected.
p value < 0.05 Ho is rejected and H1 is accepted.
Prof.Muhammad Ilyas ,std.Muhammad Saeed 98
99
• Confidence Interval Confidence interval is a range of values constructed for a variable of interest so that this range has a specified probability of including the true value of the variable. The specified probability is called the confidence level, and the end points of the confidence interval are called the confidence limits’. It is one of the alternatives to null hypothesis significance testing
(NHST).
Prof.Muhammad Ilyas ,std.Muhammad Saeed
100
•The effect size (weak, moderate or strong) Effect size is the strength of the relationship between the independent variable and the dependent variable, and/or the magnitude of the difference between levels of the independent variable with respect to the dependent variable.
0 No effect No relationship
>0 – 0.33 Small effect Weak relationship
>0.33 – 0.70 Medium/typical effect Moderate relationship
>0.70 – <1 Large effect Strong relationship
1 Maximum effect Perfect relationship
Prof.Muhammad Ilyas ,std.Muhammad Saeed
101
1. Relate why a test is applied
2. Discuss for which variable the test is applied
3. Elaborate whether the null hypothesis is rejected
or accepted p value
4. State what is the direction of the effect
5. Conclude the results
Steps in interpreting inferential statistics
Prof.Muhammad Ilyas ,std.Muhammad Saeed
102
Inferential statistics include a wide variety of tests to infer
the results. This variety of tests can be classified in two
broader categories that are
1. Non parametric tests
2. Parametric tests
Types of test used in Inferential Statistics
Prof.Muhammad Ilyas ,std.Muhammad Saeed
103
Non parametric tests are the statistical tests that are
used • When the level of measurement is nominal or ordinal. E.g. chi-square test or
Kendall’s tau-b.
• When assumptions about normal distribution in the population is not met
e.g. spearman correlation
http://www.cliffsnotes.com/WileyCDA/Section/Statistics-Glossary.id-305499,articleId-
30041.html#ixzz0c38lKKZC retrieval data: 07/01/10
Non parametric tests involve • Chi-Square test
• Kendall’s tau-b
• Spearman correlation (will be discussed in correlation section)
Prof.Muhammad Ilyas ,std.Muhammad Saeed
Non parametric test
Chi-Square Statistics
Chi-Squared test is the most commonly used non-parametric test to check the association between two nominal variables in order to accept or reject the null hypothesis. It is used to check
The association between two nominal variables
Hypothesis for Chi-Square Test Ho = there is no association between gender and geometry in h.s.
H1 = There is association between gender and geometry in h.s.
Prof.Muhammad Ilyas ,std.Muhammad Saeed 104
Chi-Squared Test
Assumptions and Conditions for the Chi-Squared test
The data of the variables must be independent.
Both the variables should be nominal.
All the expected counts are greater than 1 for chi-square.
At least 80% of the expected frequencies should be greater than or equal to 5.
Prof.Muhammad Ilyas ,std.Muhammad Saeed 105
geometry in h.s. * gender Crosstabulation
gender
Total male female
geometry in h.s. not taken Count 10 29 39
Expected Count 17.7 21.3 39.0
% of Total 13.3% 38.7% 52.0%
Taken Count 24 12 36
Expected Count 16.3 19.7 36.0
% of Total 32.0% 16.0% 48.0%
Total Count 34 41 75
Expected Count 34.0 41.0 75.0
% of Total 45.3% 54.7% 100.0%
Chi-Squared Test
Checking Assumptions and Conditions for the Chi-Squared test
Prof.Muhammad Ilyas ,std.Muhammad Saeed 106
Non parametric test
Cases
Valid Missing Total
N Percent N Percent N Percent geometry in h.s. * gender 75 100.0% 0 .0% 75 100.0%
Case Processing Summary
Value df
Asymp. Sig. (2-
sided) Exact Sig. (2-sided) Exact Sig. (1-sided)
Pearson Chi-Square 12.714a 1 .000
Continuity Correctionb 11.112 1 .001
Likelihood Ratio 13.086 1 .000
Fisher's Exact Test .000 .000
Linear-by-Linear Association 12.544 1 .000
N of Valid Casesb 75
Chi-Square Tests
Prof.Muhammad Ilyas ,std.Muhammad Saeed 107
Value Approx. Sig. Nominal by Nominal Phi
-.412 .000
Cramer's V .412 .000
N of Valid Cases 75
Symmetric Measures
Prof.Muhammad Ilyas ,std.Muhammad Saeed 108
109
Interpretation: To check the association between gender and geometry in h.s. chi-square test is conducted. The
case processing summary table indicates that there is no participant with missing value. The
assumptions are checked through crosstabs. The Crosstabulation table includes the Counts and
Expected Counts, and their relative percentages within gender. The result shows that there are 24
males who had taken geometry which is 71% of total 34 male students. On the other hand, 12 of 41
females took geometry; that is only 29% of the females. It looks like a higher percentage of males
took geometry than female students. The Ch-Square Test table tell us whether we can be confident
that this apparent difference is not due to chance.
Note, in the Cross Tabulation table, that the Expected Count of the number of male students who
didn’t take geometry is 17.7 and the observed or actual Count is 10. Thus, there are 7.7 fewer
males who didn’t take geometry than would be expected by chance, given the Totals shown in the
Table. There are also the same discrepancies between observed and expected counts in the other
three cells of the table. A question answered by the chi-square test is whether these discrepancies
between observed and expected counts are bigger than one might expect by chance.
The Chi-Square Tests table is used to determine if there is a statistically significant relationship
between two dichotomous or nominal variables. It tells you whether the relationship is statistically
significant but does not indicate the strength of the relationship, like phi or a correlation does. In
output, we use the Pearson Chi-Square or (for small samples) the Fisher’s exact test to interpret
the results of the test. They are statistically significant (p < .001), which indicates that we can be
quite certain that males and females are different on whether they take geometry.
Phi is -.412, and like the chi-square, it is statistically significant. Phi is also a measure of effect size
for an associational statistic and, in this case, effect size is moderate according to Cohen (1988)
Prof.Muhammad Ilyas ,std.Muhammad Saeed
Other Nonparametric Associational Statistics
KENDALL’S TAU-B If the variables are ordered (i.e. ordinal), you have several other choices. We will use Kendall’s tau-b in this problem.
Example: What is the relationship or association between father’s education and mother’s education?
Prof.Muhammad Ilyas ,std.Muhammad Saeed 110
Other Nonparametric Associational Statistics Case Processing Summary
Cases
Valid Missing Total
N Percent N Percent N Percent
mother education revised * father
education revised 73 97.3% 2 2.7% 75 100.0%
mother education revised * father education revised Cross tabulation
father education revised
Total 1 2 3
mother education revised 1 Count 43 8 2 53
Expected Count 35.6 13.1 4.4 53.0
% of Total 58.9% 11.0% 2.7% 72.6%
2 Count 6 10 2 18
Expected Count 12.1 4.4 1.5 18.0
% of Total 8.2% 13.7% 2.7% 24.7%
3 Count 0 0 2 2
Expected Count 1.3 .5 .2 2.0
% of Total .0% .0% 2.7% 2.7%
Total Count 49 18 6 73
Expected Count 49.0 18.0 6.0 73.0
% of Total 67.1% 24.7% 8.2% 100.0%
Prof.Muhammad Ilyas ,std.Muhammad Saeed 111
Other Nonparametric Associational Statistics
Symmetric Measures
Value Asymp. Std. Errora Approx. Tb Approx. Sig.
Ordinal by Ordinal Kendall's tau-b .494 .108 3.846 .000
N of Valid Cases 73
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
Prof.Muhammad Ilyas ,std.Muhammad Saeed 112
113
Interpretation:
To investigate the relationship between father’s education
and mother’s education, Kendall’s tau-b was used. The
analysis indicated a significant positive association between
father’s education and mother’s education, tau =.572,
p<.001. This means that more highly educated fathers were
married to more highly educated mothers and less educated
fathers were married to less educated mothers. This tau is
considered to be a large effect size (Cohen, 1988).
Prof.Muhammad Ilyas ,std.Muhammad Saeed
Other Nonparametric Associational Statistics
Prof.Muhammad Ilyas ,std.Muhammad Saeed 114
115
Interpretation
Eta was used to investigate the strength of the association between gender and number of math courses taken (eta=.33). This is a weak to medium effect size (Cohen, 1988). Males were more likely to take several or all the math courses than females.
Prof.Muhammad Ilyas ,std.Muhammad Saeed
116
SUPERIOR GROUP OF COLLEGES 116
Prof.Muhammad Ilyas ,std.Muhammad Saeed
Lecture#8 Quantitative Technique
Prof.Muhammad Ilyas ,std.Muhammad Saeed 117
Descriptive Statistics
Prof.Muhammad Ilyas ,std.Muhammad Saeed 118
119 Prof.Muhammad Ilyas ,std.Muhammad Saeed
120
After studying this session you would be able to
• Understand and infer results from data in order to answer the associational and differential research questions using different parametric and non parametric tests.
• understand implement and interpret the chi-square, phi and cramer’s V
• understand, implement and interpret the correlation statistics
• understand, implement and interpret the regression statistics
• understand, implement and interpret the T-test statistics
Lesson Objectives
Prof.Muhammad Ilyas ,std.Muhammad Saeed
121
1.Non parametric test. 1.Chi square /Fisher exact 2.Phi and cramer’s v
3.Kendall tau-b
4.Eta
2.Parametric test 1.Correlation
1.Pearson correlation
2.Spearman correlation
2.Regression
1.Simple regression
2.Multiple regression
3.T-Test 1.One-sample T-test 2.Independent sample T-test 3.Paired sample T-test
Lesson Outline
Prof.Muhammad Ilyas ,std.Muhammad Saeed
Correlation Correlation is a statistical process that determines the mutual (reciprocal)
relationship between two (or more) variables which are thought to be mutually related in a way that systematic changes in the value of one variable are accompanied by systematic changes in the other and vice versa.
It is used to determine The existence of mutual relationship that is defined by the significance (p)
value.
The direction of relationship that is defined by the sign (+,-) of the test value
The strength of relationship that is defined by the test value
Correlation Coefficient (r)
The correlation coefficient measures the strength of linear relationship between two or more numerical variables. The value of correlation coefficient can vary from -1.0 (a perfect negative correlation or association) through 0.0 (no correlation) to +1.0 (a perfect positive correlation). Note that +1 and -1 are equally high or strong
Prof.Muhammad Ilyas ,std.Muhammad Saeed 122
Correlation
Assumptions and conditions for Pearson
The two variables have a linear relationship.
Scores on one variable are normally distributed for each value of the other variable and vice versa.
Outliers (i.e. extreme scores) can have a big effect on the correlation.
Prof.Muhammad Ilyas ,std.Muhammad Saeed 123
Correlation Checking the assumptions for Pearson Correlation
The assumptions for correlation test are checked through normal curve (normality assumption) and the scatter plot (linearity assumption)
Statistics math
achievement test
scholastic aptitude test -
math N Valid 75 75
Missing 0 0 Skewness .044 .128 Std. Error of Skewness .277 .277
Prof.Muhammad Ilyas ,std.Muhammad Saeed 124
Correlation
Correlations math
achievement test
scholastic aptitude test
- math math achievement test Pearson
Correlation 1 .788**
Sig. (2-tailed) .000
N 75 75 scholastic aptitude test - math
Pearson Correlation .788** 1
Sig. (2-tailed) .000
N 75 75 **. Correlation is significant at the 0.01 level (2-tailed).
Interpretation
To investigate if there was a statistically significant association between Scholastic aptitude test and math achievement, a correlation was computed. Both the variables were approximately normal there is linear relationship between them hence fulfilling the assumptions for Pearson's correlation. Thus, the Pearson’s r is calculated, r= 0.79, p = .000 relating that there is highly significant relationship between the variables. The positive sign of the Pearson's test value shows that there is positive relationship, which means that students who have high scores in math achievement test do have high scores in scholastic aptitude test and vice versa. Using Cohen’s (1988) guidelines’ the effect size is large relating
that there is strong relationship between math achievement and scholastic aptitude test. Prof.Muhammad Ilyas ,std.Muhammad Saeed 125
Correlation
Spearman Correlation: If the assumptions for Pearson correlation are not fulfilled then consider the Spearman correlation with the assumption that the Relationship between two variables is monotonically non linear
Example: what is the association between mother’s education and math achievement
Prof.Muhammad Ilyas ,std.Muhammad Saeed 126
Correlation Correlationsa
mother's education
math achieveme
nt test Spearman's rho mother's
education Correlation Coefficient 1.000 3.15**
Sig. (2-tailed) . .006 math achievement test
Correlation Coefficient .315** 1.000
Sig. (2-tailed) .006 . **. Correlation is significant at the 0.01 level (2-tailed).
Interpretation
To investigate if there was a statistically significant association between mother’s education and math achievement, a correlation was computed. Mother’s education was skewed (skewness=1.13), which violated the assumption of normality. Thus, the spearman rho statistic was calculated, r, (73) = .32, p = .006. The direction of the correlation was positive, which means that students who have highly educated mothers tend to have higher math achievement test scores and vice versa. Using Cohen’s (1988) guidelines’ the effect size is medium for studies in his area. The r2 indicates that approximately 10% of the variance in
math achievement test score can be predicted from mother’s education. Prof.Muhammad Ilyas ,std.Muhammad Saeed 127
REGRESSION ANALYSIS Regression analysis is used to measure the relationship between two or
more variables. One variable is called dependent (response, or outcome) variable and the other is called Independent (explanatory or predictor) variables.
Regression Equation
Y = a + bx Y = a + bx1 + cx2 + dx3 + ex4
Y = dependent variable
a = Constant
b, c, d, e, = slope coefficients
x1, x2, x3, x4 = Independent variables
Types of regression analysis Simple Regression
Multiple regression
Prof.Muhammad Ilyas ,std.Muhammad Saeed 128
REGRESSION ANALYSIS
Simple Regression
Simple regression is used to check the contribution of independent variable(s) in the dependent variable if the independent variable is one.
Assumptions and conditions of simple regression
Dependent variable should be scale
The relationship of variables should be liner
Data should be independent
Example: Can we predict math achievement from grades in high school
Prof.Muhammad Ilyas ,std.Muhammad Saeed 129
Commands Analyze Regression Linear
REGRESSION ANALYSIS
Prof.Muhammad Ilyas ,std.Muhammad Saeed 130
Coefficientsa
Model
Unstandardized Coefficients
Standardized Coefficients
t Sig. B Std. Error Beta 1 (Constant) .397 2.530 .157 .876
grades in h.s. 2.142 .430 .504 4.987 .000
a. Dependent Variable: math achievement test
REGRESSION ANALYSIS
Interpretation
Simple regression was conducted to investigate how well grades in highschool predict math achievement scores. The results were statistically significant F (1, 73 ) = 24.87, p<.001. The indentified equation to understand this relationship was math achievement = .40 + 2.14* (grades in high school). The adjusted R2 value was .244. This indicates that 24% of the variance in math achievement was explained by the grades in high school. According to Cohen (1988), this is a large effect.
Regression equation is Y = 0.40 + 2.14X
Prof.Muhammad Ilyas ,std.Muhammad Saeed 131
Multiple Regression
Multiple regressions is used to check the contribution of independent variable(s) in the dependent variable if the independent variables are more than one.
Assumptions and conditions of Multiple regression Dependent variables should be scale.
Example: How well can you predict math achievement from a combination of four variables: grades in high school, father’s education, mother education and gender
REGRESSION ANALYSIS
Prof.Muhammad Ilyas ,std.Muhammad Saeed 132
Commands
Analyze Regression Linear
REGRESSION ANALYSIS
Prof.Muhammad Ilyas ,std.Muhammad Saeed 133
Model
Unstandardized Coefficients
Standardized Coefficients
T Sig. B Std. Error Beta 1 (Constant) 1.047 2.526 .415 .680
grades in h.s. 1.946 .427 .465 4.560 .000 father's education .191 .313 .083 .610 .544 mother's education .406 .375 .141 1.084 .282 gender -3.759 1.321 -.290 -2.846 .006
Coefficient
Interpretation Simultaneously multiple regression was conducted to investigate the best predictors of math achievement test scores. The means, standard deviation, and inter correlations can be found in table. The combination of variables to predict math achievement from grades in high school, father’s education, mother’s education and gender was statistically significant, F = 10.40, p <0.05. The beta coefficients are presented in last table. Note that high grades and male gender significantly predict math achievement when all four variables are included. The adjusted R2 value was 0.343. This indicates that 34 % of the variance in math achievement was explained by the model according
to Cohen (1988), this is a large effect. Prof.Muhammad Ilyas ,std.Muhammad Saeed 134
REGRESSION ANALYSIS Regression analysis is used to measure the relationship between two or
more variables. One variable is called dependent (response, or outcome) variable and the other is called Independent (explanatory or predictor) variables.
Regression Equation
Y = a + bx Y = a + bx1 + cx2 + dx3 + ex4
Y = dependent variable
a = Constant
b, c, d, e, = slope coefficients
x1, x2, x3, x4 = Independent variables
Types of regression analysis Simple Regression
Multiple regression
Prof.Muhammad Ilyas ,std.Muhammad Saeed 135
REGRESSION ANALYSIS
Simple Regression
Simple regression is used to check the contribution of independent variable(s) in the dependent variable if the independent variable is one.
Assumptions and conditions of simple regression
Dependent variable should be scale
The relationship of variables should be linear
Data should be independent
Example: Can we predict math achievement from grades in high school
Prof.Muhammad Ilyas ,std.Muhammad Saeed 136
Commands Analyze Regression Linear
REGRESSION ANALYSIS
Prof.Muhammad Ilyas ,std.Muhammad Saeed 137
Coefficientsa
Model
Unstandardized Coefficients
Standardized Coefficients
t Sig. B Std. Error Beta 1 (Constant) .397 2.530 .157 .876
grades in h.s. 2.142 .430 .504 4.987 .000
a. Dependent Variable: math achievement test
REGRESSION ANALYSIS
Interpretation
Simple regression was conducted to investigate how well grades in high school predict math achievement scores. The results were statistically significant F (1, 73 ) = 24.87, p<.001. The indentified equation to understand this relationship was math achievement = .40 + 2.14* (grades in high school). The adjusted R2 value was .244. This indicates that 24% of the variance in math achievement was explained by the grades in high school. According to Cohen (1988), this is a large effect.
Regression equation is Y = 0.40 + 2.14X
Prof.Muhammad Ilyas ,std.Muhammad Saeed 138
Multiple Regression
Multiple regressions is used to check the contribution of independent variable(s) in the dependent variable if the independent variables are more than one.
Assumptions and conditions of Multiple regression Dependent variables should be scale.
Example: How well can you predict math achievement from a combination of four variables: grades in high school, father’s education, mother education and gender
REGRESSION ANALYSIS
Prof.Muhammad Ilyas ,std.Muhammad Saeed 139
Commands
Analyze Regression Linear
REGRESSION ANALYSIS
Prof.Muhammad Ilyas ,std.Muhammad Saeed 140
Model
Unstandardized Coefficients
Standardized Coefficients
T Sig. B Std. Error Beta 1 (Constant) 1.047 2.526 .415 .680
grades in h.s. 1.946 .427 .465 4.560 .000 father's education .191 .313 .083 .610 .544 mother's education .406 .375 .141 1.084 .282 gender -3.759 1.321 -.290 -2.846 .006
Coefficient
Interpretation Simultaneously multiple regression was conducted to investigate the best predictors of math achievement test scores. The means, standard deviation, and inter correlations can be found in table. The combination of variables to predict math achievement from grades in high school, father’s education, mother’s education and gender was statistically significant, F = 10.40, p <0.05. The beta coefficients are presented in last table. Note that high grades and male gender significantly predict math achievement when all four variables are included. The adjusted R2 value was 0.343. This indicates that 34 % of the variance in math achievement was explained by the model according
to Cohen (1988), this is a large effect.
Prof.Muhammad Ilyas ,std.Muhammad Saeed 141
142
SUPERIOR GROUP OF COLLEGES 142
Prof.Muhammad Ilyas ,std.Muhammad Saeed
Last lecture #11
Prof.Muhammad Ilyas ,std.Muhammad Saeed 143
Prof.Muhammad Ilyas ,std.Muhammad Saeed 144
T-TEST Statistics
The t test is used to compare to groups to answer the differential research questions. Its values determines the difference by comparing means
Hypothesis for T-test
HO: there is no Difference
H1: There is difference
Types of T-test
There are three types of T-test
One sample t-test
Independent sample t-test
Paired sample t-test
Prof.Muhammad Ilyas ,std.Muhammad Saeed 145
T-TEST Statistics
One sample t-test
One sample t-test is used to determine if there is difference between
population mean (Test value) and the sample mean (X)
Assumptions and conditions of 1 sample t-test The dependent variable should be normally distributed within the
population
The data are independent.(scores of one participant are not depend on scores of the other :participant are independent of one another )
Example: is the mean SAT-Math score in the modified HSB data set significantly different from the presumed population mean of 500?
Prof.Muhammad Ilyas ,std.Muhammad Saeed 146
T-TEST Statistics
One-Sample Statistics
N Mean Std. Deviation Std. Error Mean scholastic aptitude test - math 75 490.53 94.553 10.918
One-Sample Test
Test Value = 500
t Df Sig. (2-tailed)
Mean Difference
95% Confidence Interval of the Difference
Lower Upper scholastic aptitude test - math -.867 74 .389 -9.467 -31.22 12.29
Prof.Muhammad Ilyas ,std.Muhammad Saeed 147
148
Interpretation: To investigate the difference between population and the sample, one-sample
t-test is conducted. The One-Sample Statistics table provides basic
descriptive statistics for the variable under consideration. The Mean AT-Math
for the students in the sample will be compared to the hypothesize population
mean, displayed as the Test Value in the One-Sample Test table. On the
bottom line of this table are the t value, df, and the two-tailed sig. (p) value,
which are circled. Note that p=.389 so we can say that the sample mean
(490.53) is not significantly different from the population mean of 500. The
table also provides the difference (-9.47) between the sample and population
mean and the 95% Confidence Interval. The difference between the sample
and the population mean is likely to be between +12.29 and -31.22 points.
Notice that this range includes the value of zero, so it is possible that there is
no difference. Thus, the difference is not statistically significant.
Prof.Muhammad Ilyas ,std.Muhammad Saeed
T-TEST Statistics
Independent sample t-test
Independent sample T-test is used to compare two independent groups (Male and Female)with respect to there effect on same dependent variable.
Assumptions and conditions of Independent T-test Variance of the dependent variable for two categories of the
independent variable should be equal to each other
Dependent variable should be scale
Data on dependent variable should be independent.
Example: Do male and female students differ significantly in regard to their average math achievement scores
Prof.Muhammad Ilyas ,std.Muhammad Saeed 149
T-TEST Statistics
The first table, Group Statistics, shows descriptive statistics for the two groups (males and females) separately. Note that the means within each of the three pairs look somewhat different. This might be due to chance, so we will check the t test in the next table. The second table, Independent Sample Test, provides two statistical tests. The left two columns of numbers are the Levene’s test for the assumption that the variances of the two groups are equal. This is not the t test; it only assesses an assumption! If this F test is not significant (as in the case of math achievement and grades in high school), the assumption is not violated, and one uses the Equal variances assumed line for the t test and related statistics. However, if Levene’s F is statistically significant (Sig. <.05), as is true for visualization, then variances are significantly different and the assumption of equal variances is violated. In that case, the Equal variances not assumed line used; and SSPS adjusts t, df, and Sig. The appropriate lines are circled.
Prof.Muhammad Ilyas ,std.Muhammad Saeed 150
151
Thus, for visualization, the appropriate t=2.39, degree of freedom (df) = 57.15, p=.020. This t is statistically significant so, based on examining the means, we can say that boys have higher visualization scores than girls. We used visualization to provide an example where the assumption of equal variances was violated (Levene’s test was significant). Note that for grades in high school, the t is not statistically significant (p=.369) so we conclude that there is no evidence of a systematic difference between boys and girls on grades. On the other hand, math achievement is statistically significant because p<.05; males have higher means. The 95% Confidence Interval of the Difference is shown in the two right-hand column of the output. The confidence interval tells us if we repeated the study 100 times, 95 of the times the true (population) difference would fall within the confidence interval, which for math achievement is between 1.05 points and 6.97 points. Note that if the Upper and Lower bounds have the same sign (either + and + or – and -), we know that the difference is statistically significant because this means that the null finding of zero difference lies outside of the confident interval. On the other hand, if zero lies between the upper or lower limits, there could be no difference, as is the case of grades in h.s. The lower limit of the confidence interval on math achievement tells us that the difference between males and females could be as small as 1.05 points out 25, which are the maximum possible scores. Effects size measures for t tests are not provided in the printout but can be estimated relatively easily. For math achievement, the difference between the means (4.01) would be divided by about 6.4, an estimate of the pooled (weighted average) standard deviation. Thus, d would be approximately .60, which is, according to Cohen (1988), a medium to large sized “effect.” Because you need means and standard deviations to compute the effect size, you should include a table with means and standard deviations in your results section for a full interpretation of t tests.
Prof.Muhammad Ilyas ,std.Muhammad Saeed
T-TEST Statistics
Paired sample t-test Paired sample T-test is used to compare two paired groups (e.g. Mothers and fathers) with respect to there effect on same dependent variable.
Assumptions and conditions of Paired sample T-test
The independent variable is dichotomous and its levels (or groups) are paired, or matched, in some way (husband-wife, pre-post etc)
The dependent variable is normally distributed in the two conditions
Example: Do students’ fathers or mothers have more education?
Prof.Muhammad Ilyas ,std.Muhammad Saeed 152
153
Paired Samples Statistics
Mean N Std. Deviation Std. Error Mean
Pair 1 father's education 4.73 73 2.830 .331
mother's education 4.14 73 2.263 .265
Paired Samples Correlations
N Correlation Sig.
Pair 1 father's education & mother's education
73 .681 .000
Prof.Muhammad Ilyas ,std.Muhammad Saeed
154
The first table shows the descriptive statistics used to compare mother’s and father’s education levels. The second table Paired Samples Correlations, provides correlations between the two paired scores. The correlation (r=.68) between mother’s and father’s education indicates that highly educate men tend to marry highly educated women and vice versa. It doesn’t tell you whether men or women have more education. That is what t in the third table tells you. The last table shows the Paired Samples t Test. The Sig. for the comparison of the average education level of the students’ mothers and fathers was p=.019. Thus, the difference in educational level is statistically significant, and we can tell from the means in the first table that fathers have more education; however, the effect size is small (d=.28), which is computed by dividing the mean of the paired differences (.59) by the standard deviation (2.1) of the paired differences. Also, we can tell from the confidence interval that the difference in the means could be as small as .10 of a point or as large as 1.08 points on the 2 to 10 scale.
Prof.Muhammad Ilyas ,std.Muhammad Saeed
Thank you!
Prof.Muhammad Ilyas ,std.Muhammad Saeed 155