LSP 121 Week 2 Intro to Statistics and SPSS/PASW

  • View
    213

  • Download
    0

Embed Size (px)

Text of LSP 121 Week 2 Intro to Statistics and SPSS/PASW

  • Slide 1
  • LSP 121 Week 2 Intro to Statistics and SPSS/PASW
  • Slide 2
  • Descriptive Statistics: Mean, Median, Percentile, Range Mean Median the middle score The score with an equal number of data points above and below If there are an even number of datapoints, take the average of the middle two Percent Rank calculates the position of a datapoint in a data set. More precisely, tells you approximately how many percent of the data is less than the datapoint. e.g. 86 th percentile means that 86 percent of data-points /people / etc were below that number Range difference between the maximum and minimum values in the data set 2
  • Slide 3
  • Median Median for bank 1 = the middle value of 11 data points Median for bank 2: even number of data points there is no middle. Take the average of the two middle values 3 Bank 1: 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Bank 2: 6.6 6.7 6.7 6.9 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8
  • Slide 4
  • Descriptive Statistics: Quartiles Lower quartile: aka first quartile - the median of the data values in the lower half of a data set (do not include the median) Middle quartile: aka second quartile - this is the overall median Upper quartile: aka third quartile - the median of the data values in the upper half of a data set (do not include the median) Note: Some statistical software packages use the 25 th, 50 th, and 75 th percentiles as their quartiles (instead of median values). SPSS determines quartiles in this way. On an exam, you would use the medians. 4
  • Slide 5
  • Quartiles For example (bank waiting times): 5 Bank 1: 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Bank 2: 6.6 6.7 6.7 6.9 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8 lower quartilemedianupper quartile Bank 2 median = (7.1 + 7.2)/2 = 7.15 lower quartile = 6.7 upper quartile = 7.7 range: 7.8 6.6 = 1.2
  • Slide 6
  • Descriptive Statistics: The Five-Number Summary The five number summary consists of: The minimum value The lower quartile (first quartile) The median (second quartile) The upper quartile (third quartile) The maximum value As mentioned earlier, SPSS determines quartiles using the percentiles: First quartile is 25 th percentile, second quartile is 50 th percentile, and third quartile is 75 th percentile 6
  • Slide 7
  • Standard Deviation Quartiles are OK for characterizing data, but standard deviation is preferred by statisticians It is a measure of how far data values are spread around the mean of a data set Formula: Std dev = sqrt(sum of (deviations from the mean) 2 / total number of data values 1) You dont need to know this formula! Dont calculate by hand, use statistical software such as SPSS (which well do in a few minutes) 7
  • Slide 8
  • Standard Deviation - Guesstimate A simple way to estimate standard deviation is the range estimate Dont rely on estimation use only to get a very quick and general idea of the value of sd. Divide range by 4 Watch for outliers. They can ruin your range estimate What is an outlier? Two or more standard deviations from the mean (above OR below) 8
  • Slide 9
  • Standard Deviation Go back to Big Bank / Best Bank example Big Bank: range = 6.9 6.9 / 4 = 1.7 Actual standard deviation is 1.96 Best Bank: range = 1.2 1.2 / 4 = 0.3 Actual standard deviation is 0.44 Any outliers? Means are 7.2 and 6.7 Big Bank: 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Best Bank: 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8 9
  • Slide 10
  • * Histograms Nice way to view a data set A histogram is a chart created by defining a set of bins and counting how many data points lie in each bin. Bars are drawn with height proportional to the number of data points in each bin. * Note: The histogram does not keep track of the value of each data point it only keeps track of which bin a data point is contained in. 10
  • Slide 11
  • Example Histogram Salaries of 26 Mens Basketball Coaches 11 What is the most common salary according to this graph? How many coaches make this amount? Between $50,000 and $100,000 Most of the coaches (15). How many coaches make less than $50,000? Only 1. How many make more than than $100,000? About 10. These would make for good exam questions
  • Slide 12
  • Statistics and SPSS/PASW While Excel can do some basic statistics, it is not considered a serious statistics tool You really should use something like SPSS/PASW or SAS Well use SPSS/PASW since DePaul has a site license 12
  • Slide 13
  • Lets Try An Example Copy the dataset grades.xls (from the QRC web page Excel Files Older Data) to My Documents and start SPSS or try the file IncomeGaps.xls Open the Grades.xls spreadsheet Note: SPSS looks for files with an extention of.sav However, Excel files have an.xls extension. You must select the Files of Type dropdown to tell SPSS to search for XLS (i.e. Excel) files. Change the variable names and make sure the data is numeric, not text Click on the Variable View tab at the bottom For each of the two rows, click the cell under Type and choose Numeric. Then click back to Data View Click on Analyze -> Descriptive Statistics -> Frequencies Copy any variables that you want to analyze (i.e. exam 1 and exam 2) into the box on the right 13
  • Slide 14
  • 14 Be careful! If the numeric fields in the dataset have any $, % or #, SPSS will have difficulty converting these to numeric In particular, if the data has dollar signs, have SPSS first convert the field to Dollar, then convert it to Numeric (IncomeGaps.xls) Lets Try An Example
  • Slide 15
  • 15 Using the grades for Exam 2, find the 5 number summary (minimum, 1 st quartile, median, 3 rd quartile, maximum) See this link for instructionsthis link for instructions Mean Range What is the standard deviation? Lets Try An Example
  • Slide 16
  • Listing Z-Values A good stats package will make it easy to determine z-values Click on Analyze Descriptive Statistics Descriptives Choose the variable, lets use Exam2 Be sure the check Save standardized values as variables at the bottom When you return to the Data View you will see that a new column has appeared giving you the z-score for every value in the Exam2 data set 16
  • Slide 17
  • Pivot Tables Lets say you have just performed a survey. One of the questions you ask is: What type of home computer Internet connection do you have? Answers can be: None, Dial-up, DSL, Cable, Other, Not Sure. 17
  • Slide 18
  • Pivot Tables Here are some of your results 18 Respondent IDCable Type 11111 no 11112 ds 11113 cm 11114 dk 11115 du 11116 du Where no = none; ds = dsl; cm = cable modem; du = dial up; dk = dont know; ot = other
  • Slide 19
  • Pivot Tables You can use SPSS to count the occurrences of data items, just like a pivot table Open a new file: File New Enter your data into SPSS (you can leave out the IDs for now) Click on Analyze / Descriptive Statistics / Frequencies Move the variable that you want to count from the left box to the right box Make sure Display Frequencies Table is checked Run it (Click OK) 19
  • Slide 20
  • Crosstabulations (Crosstabs) Crosstabs are an extension of pivot tables Lets say you have asked a number of students: How many schools did you apply to? You get results something like the following (in a spreadsheet): 20
  • Slide 21
  • Crosstabs 21 Respondent IDSex# of schools 1F6 2M2 3F7 4M4 5F9 6F10 7M3 8M2 9F7 F5
  • Slide 22
  • Crosstabs Now open the data in SPSS Then pull down the menu Analyze and click on Descriptive Statistics, then Crosstabs What variable do you want in the row? The column? We are probably interested in determining examining how many schools females apply to relative to males When ready, click OK to perform the crosstab. 22