33
Lecture 1 – Basic Descriptive Techniques Describing Data – Howell Chapter 1,2, & 3, although he covers more than I will here. (So be sure to read his chapters.) The cross that psychologists bear A rocket crashes in the cemetery. A rock and a person survive the crash. A physicist examines the rock. A psychologist examines the person. The rock. Every day it is the same. Everything notable about it is unchanged. Inside the rock is a vein of copper. It’s effect on the rock’s weight and density and mass are unchanging. The person. Every day it is different. Inside the person is a “vein” of depression. But the effect of that depression on the person’s behavior changes from one time to the next – sometimes the person fights it and appears happy. Other times, the person succumbs and appears sad. Every day the manifestation of depression differs. That’s the cross that psychology bears – the behaviors we exhibit vary from time to time, regardless of the fact that our inner core of characteristics is essentially unchanged across time. Statistics to the rescue. Statistics is a collection of procedures for enabling us to see underlying constancy through the variability of behavior. Three Descriptive Techniques 1. Tables. Regular Frequency Distribution, Grouped Frequency Distribution 2. Graphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot Basic Descriptive Techniques - 1 6/17/2022

Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

Embed Size (px)

Citation preview

Page 1: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

Lecture 1 – Basic Descriptive Techniques

Describing Data – Howell Chapter 1,2, & 3, although he covers more than I will here. (So be sure to read his chapters.)

The cross that psychologists bear

A rocket crashes in the cemetery. A rock and a person survive the crash.A physicist examines the rock. A psychologist examines the person.

The rock. Every day it is the same. Everything notable about it is unchanged. Inside the rock is a vein of copper. It’s effect on the rock’s weight and density and mass are unchanging.

The person. Every day it is different. Inside the person is a “vein” of depression. But the effect of that depression on the person’s behavior changes from one time to the next – sometimes the person fights it and appears happy. Other times, the person succumbs and appears sad. Every day the manifestation of depression differs.

That’s the cross that psychology bears – the behaviors we exhibit vary from time to time, regardless of the fact that our inner core of characteristics is essentially unchanged across time.

Statistics to the rescue. Statistics is a collection of procedures for enabling us to see underlying constancy through the variability of behavior.

Three Descriptive Techniques

1. Tables.

Regular Frequency Distribution, Grouped Frequency Distribution

2. Graphs/ plots

Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot

3. Numeric Summaries

Mean, median, standard deviation, variance, Pearson correlation coefficient

Our tools

Computer programs

ExcelSPSSR

SAS; Stata; Minitab

Basic Descriptive Techniques - 1 5/8/2023

Page 2: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

Some major considerations in describing data

1. The data type - important for choice of table or graph.

Categorical: Variables for which differences between people are qualitative, not quantitativeGender; Graduate program – RM vs. I/O;

Quantitative: Variables for which difference between people are quantitativeHeight; weight; Extraversion;

2. The way the data are distributed - important for choice of numeric summary.

Two basic shapes of distributions: Symmetric and Skewed

3. Outliers.

Data values, persons, or variables which do not belong in the sample.

Most problematic when they are extreme – far from the other values, cases, or variables.

Data point outlier: A single value which was not created from the process that created all the other values.

Most problematic when the value is quite different from the other values in the sample.

Age = 193. Hmm.

Person (case) outlier: A person (often called a case), whose values on one or more variables are unusual.

A person whose pattern or profile of values on a collection of variables is not like others in the same population.

A friendly, affable person with low performance scores in a customer service job.

Variable Outlier: A variable whose values aren't as they were expected to be.

E.g., an aberrant question in a set of questions forming a scale.When we form summated scales, we expect all of the items in the scale to be positively correlated.An item whose correlation with other items is negative rather than the expected positive.

Study outlier: A study whose results were extreme relative to expectations.

Example: A study that reported a negative relationship of teacher evaluations to learning when all other studies showed positive correlations.

You should not analyze or present data until you are satisfied that all outliers have been removed.

Basic Descriptive Techniques - 2 5/8/2023

Page 3: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

Tables – Howell Chapter 3

Regular or Ungrouped Frequency Distribution

Definition: An ordered list of all possible score values from the largest observed value down to the smallest observed value and the frequency/percentage of each value. Interior values that could have occurred but didn’t are represented.

Best used for: Categorical data.

Example: Responses of employees to the question

My job involves risky, serious health hazards

Response Frequency Percent7: Strongly agree 34 17.56: Moderately agree 27 13.95: Slightly agree 33 17.04: Neither A nor D 20 10.33: Slightly Disagree 27 13.92: Moderately disagree 15 7.71: Strongly disagree 38 19.6

Rules for creating Ungrouped Frequency Distributions

1. Large scores should go at the top. Note that SPSS’s default table above does not follow that rule.

2. Interior score values with 0 frequency must be represented in the table.

3. When comparing groups, put the tables side by side, not one-on-top-the-other.

ATV Accidents – Comparing Death Rates of those wearing helmets vs those not wearing helmets

Wearing a Helmet Not Wearing a Helmet

Death Freq Pct Death Freq PctYes 9 2.6 Yes 2 3.2No 335 97.4 No 61 96.8

Major problem with a regular frequency distribution occurs when there are more than about 10 values.In those cases, the tables become unwieldy.

The creation of frequency distributions is what many persons call “statistical analysis”.

Basic Descriptive Techniques - 3 5/8/2023

Page 4: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

Grouped Frequency Distribution Howell p. 40

Definition: An ordered List of score groups from a group containing the largest observed value down to a group containing the smallest observed value and frequency of occurrence in each

Best used for: Typically used for continuous variables or variables with many values, e.g., Length IQ, Job Satisfaction, Organizational Commitment.

Alas, SPSS has no procedure which produces grouped frequency distributions such as those presented in the text. I am not aware of any easy-to-use procedures in R or Excel for grouped frequency distributions. If you want one, you’ll have to make it yourself.

Examples: Initial Peabody Picture Vocabulary Scores of Early Success Students

Table 1. Grouped Frequency Distribution of initial PPVT scores of Control and Experimental Groups.

Control Group Experimental GroupPPVT Interval f Percent PPVT

Intervalf Percent

130-139 1 0.5 130-139 3 0.8120-129 0 0.0 120-129 7 1.8110-119 17 10.1 110-119 32 8.3100-109 22 13.1 100-109 69 17.990-99 30 17.9 90-99 96 24.980-89 42 25.0 80-89 90 23.470-79 31 18.5 70-79 50 13.060-69 20 11.9 60-69 31 8.150-59 3 1.8 50-59 5 1.340-49 1 0.5 40-49 2 0.5

Total 167 100 Total 385 100

Rules for creating frequency distributions including those comparing groups:

1. Largest scores are at the top of the table. Smallest values are at the bottom.

2. Put tables side-by-side when comparing groups. Identical intervals on the same line.

3. When comparing groups, scales for both groups must be identical.

40-49, 50-59, . . . 130-139 in both of the above tables.Basic Descriptive Techniques - 4 5/8/2023

Page 5: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

4. Percent column must be included if group sizes are different.

5. Interior intervals with 0 frequency must be represented.

Basic Descriptive Techniques - 5 5/8/2023

Page 6: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

Stem & Leaf Display Howell p. 41

Definition: An ordered representation of scores in which rows represent score intervals and numbers within rows represent individual values. The rows are called stems and the numbers within rows are called leaves.

Popularized by John Tukey. A great paper-and-pencil way of organizing data.

The most straightforward such table is one representing two-digit scores. In this case, rows correspond to the first digit of each number . Within each row, the last digit of each number represents the number.

For example, consider the following two-digit values . . .

24 29 40 58 42 9 15 20 78 90 96 26 10 16 38 46 29 65 82 71 81 45 52 68 49 94

These would be represented in a stem & leaf display as follows . . .

Stems Leaves0 91 5 0 6 2 4 9 0 6 93 8 4 0 2 6 5 95 2 86 5 87 8 18 2 19 0 6 4

Usually, the leaves are ordered from smallest to largest within stems . . .

Stems Ordered Leaves0 91 0 5 6 2 0 4 6 9 93 8 4 0 2 5 6 95 26 5 87 1 88 1 29 0 4 6

Basic Descriptive Techniques - 6 5/8/2023

Stem Leaf

Compare with a grouped frequency distribution . . .

90-99 380-89 270-79 260-69 2

Page 7: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

Graphs

Bar Graph or Bar Chart Howell p. 46

Definition: Columns whose location represents value and whose length represents frequency.

Used for: Used for Categorical dataGraphical equivalent of the Regular Frequency Distribution.

Creation: Produced by SPSS Frequencies procedure (Analyze -> Descriptive Statistics ->Frequencies)Or by SPSS Graphs -> Bar…

Example: Types of jobs held by respondents in a manufacturing facility.

Basic Descriptive Techniques - 7 5/8/2023

A categorical variable.

Page 8: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

Histogram Howell p. 39

Definition: Columns whose location represents value and whose area represents frequency.Columns are usually contiguous and of equal width so height represents frequency.

Used for: Continuous or quantitative data, usually grouped.

Creation: SPSS Frequencies procedure(Analyze -> Descriptive Statistics -> Frequencies)Or by SPSS Graphs -> Histogram…

Example 1: Distribution of salaries of employees at a bank in the 1970s

.

Rules:

1. If you’re comparing groups, put one histogram on top of the other.

2. Make x-axis scales of histograms being compared identical. (See later page for how to do this.)Basic Descriptive Techniques - 8 5/8/2023

FYI the distributions are positively skewed.

Comparison of the histograms suggests that male salaries were higher than female salaries in this organization.

This ellipse is not officially part of the histogram. Included to show that there were no females with salaries in the $70,000+ range.

Page 9: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

Box and Whisker Plot

A single group

A representation of the Maximum3rd Quartile (75th Percentile point)Median (50th Percentile point)1st Quartile (25th Percentile point)Minimum

Used for continuous or ordered response and other "many valued" variables.

Produced by SPSS Examine procedure (Analyze -> Descriptive Statistics -> Explore )Or by SPSS Graphs -> Boxplot…

Comparing Groups

Basic Descriptive Techniques - 9 5/8/2023

The distributions are slightly positively skewed – the top whisker is slightly longer than the bottom whisker.

It appears that females are a little older than males in this organization – the median for females is slightly higher than that for males.43143N =

Se x o f Su b je c t

f emalemale

Ag

e o

f S

ub

jec

t

70

60

50

40

30

20

10

95

17174

168

152

Page 10: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

Dot Plot

A plot in which

1. Each score is represented by a symbol – a dot, usually, and2. Location of the symbols represent values, and3. Piles of symbols are used to indicate multiple values at the same place.

Example . . .Dot plot of Hexaco eXtraversion scale scores of 1900+ UTC students . . .

Grades in PSY 3120 course . . .(Symbols are vertical lines, not dots) . . .

Basic Descriptive Techniques - 10 5/8/2023

Page 11: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

Description Using SPSS, RCMDR, and ExcelStart here on 8/29/17First, using SPSS . . .

Run SPSS.

Enter the data: File -> Open -> Data…

Choose an SPSS Procedure for the analysis

Basic Descriptive Techniques - 11 5/8/2023

Page 12: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

This example will use data from a project involving victims of ATV accidents.

The data (‘G:MDBT\InClassDatasets\ATVDataForClass050906.sav’)The SPSS Data Editor showing the numbers that were actually entered.

Same data with View -> Value Labels chosen – value labels associated with the numbers are displayed.

The variable of interest for this example is a measure of injury severity, appropriately named the Injury Severity Score (ISS).

The ISS is a quantity that is computed for hospital patients based on examination of trauma on several parts of the body. Each part is assigned a value. The ISS is a composite of those individual body-area values. The larger the number, the more severely injured the patient. The value, 0, means essentially no injury.

Basic Descriptive Techniques - 12 5/8/2023

Page 13: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

The FREQUENCIES procedure – the most often used procedure in SPSS.

1. What FREQUENCIES provides.

Menu Sequence to access the FREQUENCIES procedure: Analyze Descriptive Statistics Frequencies

Dialog with FREQUENCIES PROCEDURESpecifying which variables to analyze

Choosing specifics

Basic Descriptive Techniques - 13 5/8/2023

Find the variable’s name in the leftmost field, highlight it, and click on the arrow between the fields to move it to the rightmost field.

To create a graph, click on the Charts button.

ISS is a continuous variable, so a Histogram will be used. The normal curve overlay will be used to give a visual indication of how nearly normal the distribution is.

Page 14: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

The FREQUENCIES output.

Frequencies

A dot plot of the same data. Graphs -> Legacy Dialogs -> Scatter/Dot -> Simple Dot

Basic Descriptive Techniques - 14 5/8/2023

The histogram has been reduced in size to make it fit on the page.

The FREQUENCIES procedure should be used for every variable in every data set you analyze.

Argh!!! See last page of lecture for a note about dot plots in SPSS.

806040200

iss

200

150

100

50

0

Frequency

Mean = 10.21Std. Dev. = 7.91N = 500

Histogram

iss

33 6.6 6.6 6.6

9 1.8 1.8 8.4

1 .2 .2 8.6

72 14.4 14.4 23.0

79 15.8 15.8 38.8

13 2.6 2.6 41.4

22 4.4 4.4 45.8

67 13.4 13.4 59.2

36 7.2 7.2 66.4

5 1.0 1.0 67.4

6 1.2 1.2 68.6

30 6.0 6.0 74.6

26 5.2 5.2 79.8

9 1.8 1.8 81.6

25 5.0 5.0 86.6

4 .8 .8 87.4

6 1.2 1.2 88.6

2 .4 .4 89.0

13 2.6 2.6 91.6

7 1.4 1.4 93.0

5 1.0 1.0 94.0

8 1.6 1.6 95.6

7 1.4 1.4 97.0

2 .4 .4 97.4

5 1.0 1.0 98.4

1 .2 .2 98.6

1 .2 .2 98.8

1 .2 .2 99.0

1 .2 .2 99.2

1 .2 .2 99.4

1 .2 .2 99.6

1 .2 .2 99.8

1 .2 .2 100.0

500 100.0 100.0

1

2

3

4

5

6

8

9

10

11

12

13

14

16

17

18

19

20

21

22

24

25

26

27

29

30

33

34

35

43

45

50

75

Total

ValidFrequency Percent Valid Percent

CumulativePercent

Page 15: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

Description using RcmdrRun R - > Load Package - Rcmdr

In Rcmdr . . .

Data -> Import Data -> from text file, clipboard, or URL . . . (I chose to import from a Comma Separated Values (CSV) file which is a form of text file.)

The data were originally entered into SPSS.

I saved the data from SPSS in the CSV file format.

I’m now opening that CSV file with Rcmdr.

Be sure to check the [Commas] checkbox.

For your information, you can import data saved in SPSS’s file format into RCMDR. I’m demonstrating use of CSV since many different statistical packages can read CSV files.

You should see something like this in Rcmdr’s output window . . .

Basic Descriptive Techniques - 15 5/8/2023

Page 16: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

View data set

Graphs -> Histogram -> ISS Graphs -> Dot plot -> ISS

Basic Descriptive Techniques - 16 5/8/2023

Page 17: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

Graphical output from Rcmdr will appear in the R window.

Basic Descriptive Techniques - 17 5/8/2023

Page 18: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

The same process in ExcelRun Excel.

Import the data from an Excel or CSV file

Or

Basic Descriptive Techniques - 18 5/8/2023

Page 19: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

Basic Descriptive Techniques - 19 5/8/2023

Page 20: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

A portion of the data . . .

Basic Descriptive Techniques - 20 5/8/2023

Page 21: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

Data -> Data Analysis

More elaborate specifications can be provided. I won’t demonstrate them and won’t require them for this class. If you’re interested . . .

There is a text that shows how to create basic tabular and graphical displays in Excel.

Pace, L. A. (2011). Statistical Analysis Using Excel 2007. Boston, MA: Prentice Hall.

Basic Descriptive Techniques - 21 5/8/2023

Page 22: Day 1 - The University of Tennessee at Chattanooga · Web viewGraphs/ plots Bar Graph, Histogram, Dot Plot, Stem and Leaf Display, Box and Whisker plot, Scatterplot 3. Numeric Summaries

Issues with Dot Plots in SPSS.SPSS sometimes does not create dot plots that make sense.

Problem 1. HUGE Dots

Sometimes the program creates dot plots with dots that are far too big.

Solution: Double-click on the plot.Double-click on a dot.In the dialog box that opens, choose the size you want. Click [Apply].

Problem 2. X axis values are not systematic and increasing.

The program needs to be told that the variable is a Scale.

Solution: Click on the [Variable View] tab in the Data Editor window.Click on the name of the variable you wish to plot.Scroll to the right and click on the box under Measure.Choose [Scale].Click on the [Data View] tab and save the file.Try the dot plot again. It should now appear correctly.

Basic Descriptive Techniques - 22 5/8/2023