What is a Z Score and Hypothesis Testing

Embed Size (px)

Citation preview

  • 8/2/2019 What is a Z Score and Hypothesis Testing

    1/2

    What is a Z score What is a p-value

    Most statistical tests begin by identifying a null hypothesis. The null hypothesis for pattern analysistools essentially states that there is no spatial pattern among the features, or among the valuesassociated with the features, in the study area -- said another way: the expected pattern is just

    one of the many possible versions of complete spatial randomness. The Z score is a test ofstatistical significance that helps you decide whether or not to reject the null hypothesis. The p-value is the probability that you have falsely rejected the null hypothesis.Z scores are measures of standard deviation. For example, if a tool returns a Z score of +2.5 it isinterpreted as "+2.5 standard deviations away from the mean". P-values are probabilities. Bothstatistics are associated with the standard normal distribution. This distribution relates standarddeviations with probabilities and allows significance and confidence to be attached to Z scores andp-values.

    Very high or a very low (negative) Z scores, associated with very small p-values, are found in thetails of the normal distribution. When you perform a feature pattern analysis and it yields small p-values and either a very high or a very low (negative) Z score, this indicates it is very UNLIKELYthat the observed pattern is some version of the theoretical spatial random pattern represented by

    your null hypothesis.In order to reject the null hypothesis, you must make a subjective judgment regarding the degreeof risk you are willing to accept for being wrong. This degree of risk is often given in terms ofcritical values and/or confidence levels.To give an example: the critical Z score values when using a 95% confidence level are -1.96 and+1.96 standard deviations. The p-value associated with a 95% confidence level is 0.05. If your Zscore is between -1.96 and +1.96, your p-value will be larger than 0.05, and you cannot reject

    your null hypothsis; the pattern exhibited is a pattern that could very likely be one version of arandom pattern. If the Z score falls outside that range (for example -2.5 or +5.4), the patternexhibited is probably too unusual to be just another version of random chance and the p-value will

    be small to reflect this. In this case, it is possible to reject the null hypothesis and proceed withfiguring out what might be causing the statistically significant spatial pattern.A key idea here is that the values in the middle of the normal distribution (Z scores like 0.19 or -

    1.2, for example), represent the expected outcome (the norm ...generally uninteresting). Whenthe absolute value of the Z score is large (in the tails of the normal distribution) and theprobabilities are small, you are seeing something unusual and generally very interesting. FortheHot Spot Analysistool, for example, "unusual" means either a statistically significant hot spotor a statistically significant cold spot.

    The Null Hypothesis

    Many of the statistics in the spatial statistics toolbox are inferential spatial pattern analysistechniques (i.e.,Global Moran's I,Local Moran's I,Gi*). Inferential statistics are grounded inprobability theory. Probability is a measure of chance, and underlying all statistical tests (either

    directly or indirectly) are probability calculations that assess the role of chance on the outcome ofyour analysis. Typically, with traditional (non-spatial) statistics, you work with a random sampleand try to determine the probability that your sample data is a good representation (is reflective)

    http://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/hot_spot_analysis_getis_ord_gi_star_spatial_statistics_.htmhttp://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/hot_spot_analysis_getis_ord_gi_star_spatial_statistics_.htmhttp://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/hot_spot_analysis_getis_ord_gi_star_spatial_statistics_.htmhttp://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/spatial_autocorrelation_morans_i_spatial_statistics_.htmhttp://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/spatial_autocorrelation_morans_i_spatial_statistics_.htmhttp://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/spatial_autocorrelation_morans_i_spatial_statistics_.htmhttp://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/cluster_and_outlier_analysis_colon_anselin_local_moran_s_i_spatial_statistics_.htmhttp://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/cluster_and_outlier_analysis_colon_anselin_local_moran_s_i_spatial_statistics_.htmhttp://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/cluster_and_outlier_analysis_colon_anselin_local_moran_s_i_spatial_statistics_.htmhttp://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/hot_spot_analysis_getis_ord_gi_star_spatial_statistics_.htmhttp://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/hot_spot_analysis_getis_ord_gi_star_spatial_statistics_.htmhttp://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/hot_spot_analysis_getis_ord_gi_star_spatial_statistics_.htmhttp://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/hot_spot_analysis_getis_ord_gi_star_spatial_statistics_.htmhttp://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/cluster_and_outlier_analysis_colon_anselin_local_moran_s_i_spatial_statistics_.htmhttp://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/spatial_autocorrelation_morans_i_spatial_statistics_.htmhttp://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/hot_spot_analysis_getis_ord_gi_star_spatial_statistics_.htm
  • 8/2/2019 What is a Z Score and Hypothesis Testing

    2/2

    of the population at large. As an example, you might ask: "What are the chances that the resultsfrom my exit poll (showing candidate A will beat candidate B by a slim margin, perhaps) will reflectfinal election results?" But with many spatial statistics, including the spatial autocorrelation typestatistics listed above, very often you are dealing with allavailable data for the study area (all

    crimes, all disease cases, attributes for every census block, and so on). When you compute astatistic (themean, for example) for the entire population, you no longer have an estimate at all.You have a fact. Consequently, it makes no sense to talk about "likelihood" or "probabilities" anymore. So what can you do in the case where you have all data values for a study area? You canonly assess probabilities by postulating, via the null hypothesis, that your spatial data are, in fact,part of some larger population.Where appropriate, the tools in the spatial statistics toolbox use the randomization null hypothesis

    as the basis for statistical significance testing. The randomization null hypothesis postulates thatthe observed spatial pattern of your data represents one of many (n!) possible spatialarrangements. If you could pick up your data values and throw them down onto the features inyour study area, you would have one possible spatial arrangement. The randomization nullhypothesis states that if you could do this exercise (pick them up, throw them down) infinitetimes, most of the time you would produce a pattern that would not be markedly different from

    the observed pattern (your real data). Once in a while you might accidentally throw all of thehighest values into the same corner of your study area, but the probabilities of doing that aresmall. The randomization null hypothesis states that your data is one of many, many, many

    possible versions of complete spatial randomness. The data values are fixed; only their spatialarrangement could vary.A common alternative null hypothesis, not implemented for the spatial statistics toolbox, is thenormalization null hypothesis. The normalization null hypothesis postulates that the observedvalues are derived from an infinitely large, normally distributed population of values through somerandom sampling process. With a different sample you would get different values, but you wouldstill expect those values to be representative of the larger distribution. The normalization nullhypothesis states that the values represent one of many possible sample of values. If you could fityour observed data to a normal curve and then randomly select values to toss onto your studyarea, most of the time you would produce a pattern and distribution of values that would not bemarkedly different from the observed pattern/distribution (your real data). The normalization nullhypothesis states that your data and their arrangement are one of many, many, many possiblerandom samples. Neither the data values nor their spatial arrangment are fixed. The normalizationnull hypothesis is only appropriate when the data values are normally distributed.

    Additional Resources:

    http://support.esri.com/index.cfm?fa=knowledgebase.gisDictionary.search&search=true&searchTerm=meanhttp://support.esri.com/index.cfm?fa=knowledgebase.gisDictionary.search&search=true&searchTerm=meanhttp://support.esri.com/index.cfm?fa=knowledgebase.gisDictionary.search&search=true&searchTerm=meanhttp://support.esri.com/index.cfm?fa=knowledgebase.gisDictionary.search&search=true&searchTerm=mean