15
Neighbourhood effects on health: Does it matter where you draw the boundaries? Robin Flowerdew a, * , David J. Manley a , Clive E. Sabel b a University of St Andrews, School of Geography and Geosciences, Irvine Building, North Street, St Andrews, Fife, Scotland KY169AL, United Kingdom b Department of Epidemiology and Public Health, Imperial College London, St Mary’s Campus, Norfolk Place, London W2 1PG, United Kingdom Available online 5 January 2008 Abstract There has been considerable discussion in health geography and related areas of neighbourhood effects on health: the idea that people’s health in one geographical area may be influenced not only by the composition of that area’s population, but also by the area’s geographical context. Hence, the healthiness or otherwise of the neighbourhood may have an important effect on local peo- ple’s health. Although neighbourhoods and their boundaries are sometimes obvious to local residents, it is more common to find considerable disagreement on the size and contents of a neighbourhood. In this paper, we use British census Enumeration Districts as building blocks to construct alternative zonal systems, and experiment to see if neighbourhoods defined in different ways have similar implications for health. The well known modifiable areal unit problem shows that analytical conclusions may differ substantially according to how data are aggregated. Boundaries can be drawn to maximize equality of size, compactness of shape, homogeneity in social composition, accordance with ‘natural’ boundaries, and probably many other factors; which of these criteria are more effective in defining zones relevant to health? One conclusion is that the effect of neighbourhood conditions should be looked at using several different ways to define neighbourhoods, and that the size and composition of these neighbourhoods may be different in different parts of a study area. Ó 2007 Elsevier Ltd. All rights reserved. Keywords: UK; Modifiable areal unit problem; Zone design; Neighbourhood; Context and composition Introduction Geographical variations in health exist at many scales from the global to the local (Boyle, Gatrell, & Duke-Williams, 2004). Analysis of data sources including census data, hospital admission records, and disease registration shows considerable fluctuation, with a general tendency for health to be worst in areas characterised by poverty and deprivation. In the United Kingdom, macro-scale analyses of census health data have been published by Charlton, Wallace, and White (1994) for 1991 data and by Dorling and Thomas (2004) for 2001 data. Among other studies of local- level variation, a theme issue of Geographical and Environmental Modelling was devoted to a set of papers * Corresponding author. Tel.: þ44 (0)1334 463853. E-mail addresses: r.fl[email protected] (R. Flowerdew), [email protected] (D.J. Manley), [email protected] (C.E. Sabel). 0277-9536/$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.socscimed.2007.11.042 Social Science & Medicine 66 (2008) 1241e1255 www.elsevier.com/locate/socscimed

Neighbourhood effects on health: Does it matter where you draw the boundaries?

Embed Size (px)

Citation preview

Page 1: Neighbourhood effects on health: Does it matter where you draw the boundaries?

Social Science & Medicine 66 (2008) 1241e1255www.elsevier.com/locate/socscimed

Neighbourhood effects on health: Does it matterwhere you draw the boundaries?

Robin Flowerdew a,*, David J. Manley a, Clive E. Sabel b

a University of St Andrews, School of Geography and Geosciences, Irvine Building,

North Street, St Andrews, Fife, Scotland KY169AL, United Kingdomb Department of Epidemiology and Public Health, Imperial College London, St Mary’s Campus, Norfolk Place,

London W2 1PG, United Kingdom

Available online 5 January 2008

Abstract

There has been considerable discussion in health geography and related areas of neighbourhood effects on health: the idea thatpeople’s health in one geographical area may be influenced not only by the composition of that area’s population, but also by thearea’s geographical context. Hence, the healthiness or otherwise of the neighbourhood may have an important effect on local peo-ple’s health. Although neighbourhoods and their boundaries are sometimes obvious to local residents, it is more common to findconsiderable disagreement on the size and contents of a neighbourhood. In this paper, we use British census Enumeration Districtsas building blocks to construct alternative zonal systems, and experiment to see if neighbourhoods defined in different ways havesimilar implications for health. The well known modifiable areal unit problem shows that analytical conclusions may differsubstantially according to how data are aggregated. Boundaries can be drawn to maximize equality of size, compactness of shape,homogeneity in social composition, accordance with ‘natural’ boundaries, and probably many other factors; which of these criteriaare more effective in defining zones relevant to health? One conclusion is that the effect of neighbourhood conditions should belooked at using several different ways to define neighbourhoods, and that the size and composition of these neighbourhoodsmay be different in different parts of a study area.� 2007 Elsevier Ltd. All rights reserved.

Keywords: UK; Modifiable areal unit problem; Zone design; Neighbourhood; Context and composition

Introduction

Geographical variations in health exist at manyscales from the global to the local (Boyle, Gatrell, &Duke-Williams, 2004). Analysis of data sources

* Corresponding author. Tel.: þ44 (0)1334 463853.

E-mail addresses: [email protected] (R. Flowerdew),

[email protected] (D.J. Manley), [email protected]

(C.E. Sabel).

0277-9536/$ - see front matter � 2007 Elsevier Ltd. All rights reserved.

doi:10.1016/j.socscimed.2007.11.042

including census data, hospital admission records, anddisease registration shows considerable fluctuation,with a general tendency for health to be worst in areascharacterised by poverty and deprivation. In the UnitedKingdom, macro-scale analyses of census health datahave been published by Charlton, Wallace, and White(1994) for 1991 data and by Dorling and Thomas(2004) for 2001 data. Among other studies of local-level variation, a theme issue of Geographical andEnvironmental Modelling was devoted to a set of papers

Page 2: Neighbourhood effects on health: Does it matter where you draw the boundaries?

1242 R. Flowerdew et al. / Social Science & Medicine 66 (2008) 1241e1255

analyzing the same local-scale health data set using dif-ferent techniques (Fotheringham, 1999). Another bodyof work has focused on the question of whether theremay be area differences in health which are independentof characteristics of individual residents. Kawachi andBerkman (2003) for example, have edited a book onNeighborhoods and Health containing review papers(Diez-Roux, 2003; Macintyre & Ellaway, 2003) andstudies of different aspects of local-scale health varia-tion. Pickett and Pearl (2001) have also reviewed thismaterial.

The notion of neighbourhoods needs to be examinedmore critically. We set out a methodology to demon-strate the variable presence of significant ‘neighbour-hood effects’ in area data, depending on how areas aredelimited. Studies trying to identify neighbourhoodeffects have tended to use readily available geographi-cal data units such as wards in the United Kingdomand census tracts in the United States (see table 1.1,Kawachi & Berkman, 2003 p. 4e5). These units maynot coincide with the neighbourhoods that have aneffect on health. The main objective of the paper wasto determine how much conclusions about the existenceand size of the neighbourhood effect on health are de-pendent on how neighbourhood boundaries are defined.The paper begins with a review of ideas about howneighbourhood characteristics might affect health andof the ways in which neighbourhoods might be defined.The sensitivity of analytical results to how areal unitsare defined is related to the modifiable areal unit prob-lem, which is also discussed in the paper, followed byan account of zone design software which facilitatesthe analysis of the effects of zone design. An analysisof the neighbourhood effect using the standard systemof areal units is presented, using Northamptonshireand Swindon (England) as case studies. Then, thesame data are re-aggregated according to 50 differentsets of ward-level zonal systems and the analysis of theneighbourhood effect is repeated for each set. This allowsan evaluation of the robustness of the original results, andhence an answer to the question posed in the title.

Neighbourhood effects on health

In exploring neighbourhood effects on health, oneimportant question has concerned the roles of contextand composition in accounting for local variations(Macintyre, MacIver, & Sooman, 1993; Pickett & Pearl,2001). There is little doubt that health statistics in anarea are influenced by the composition of the local pop-ulation. Age, education, employment, ethnicity, hous-ing, social class and other factors may all influence

individuals’ health. But are these compositional factorssufficient to account for geographical variations, or isthere a contextual effect, some sort of local influencewhich makes an area’s health better (or worse) thanwhat would be expected from population composition?

Such contextual effects could be of several differenttypes. First, the natural environment may be conduciveto good or bad health (see for instance Gatrell, 2002 orMeade & Earickson, 2000). This is most obviously thecase with water or atmospheric pollution, or pollen count,but there may also be health effects from temperature andprecipitation regimes. A noisy environment may havenegative health effects. It can even be argued that topog-raphy has an effect on health, in that residents of a hillyarea may be healthier than plains dwellers because ofthe aerobic activity involved in walking around the area.

Second, the local availability of health care is a con-textual effect; good access to a general practitioner orother medical service providers may have a direct influ-ence on health. Even if the medical facilities are notlocated in the area itself, good transport links may stillgive the residents a health advantage. In a similar way,the notion of food deserts has highlighted the fact thatmany poorer neighbourhoods have worse access to re-tailers offering fresh fruits and vegetables through thechoice of others in the neighbourhood not to consumesuch items, providing another example of the contexteffect (Kawachi & Berkman, 2003).

Third, different areas with similar population com-position may differ in the social and cultural normsand values that are dominant. Some of these factors,relating to community spirit, participation and trust,are grouped together in the concept of social capital(Putnam, 2000). If the area has a strong and neighbourlycommunity, residents will talk to each other and per-haps influence each other’s behaviour, with implica-tions, positive or negative, for health. For example,certain areas may have high proportions of smokers ordrug users, making it more likely that individuals willadopt these unhealthy habits.

Fourth, and perhaps less obviously, it may be that thehealth effects of deprivation apply to relative rather thanabsolute deprivation (Wilkinson, 1996). Somebody liv-ing in a neighbourhood of relative prosperity mayexperience worse health than somebody with similarcharacteristics living in a more deprived area. Alterna-tively, a more deprived person may have better healthif living in a prosperous area, perhaps for the socio-cultural reasons highlighted above (see for instanceCheshire, 2007).

There has been plenty of discussion of the relativeimportance of context and composition in affecting

Page 3: Neighbourhood effects on health: Does it matter where you draw the boundaries?

1243R. Flowerdew et al. / Social Science & Medicine 66 (2008) 1241e1255

neighbourhood health (see for example Macintyre et al.,1993; McCulloch, 2001; Shaw, Dorling, & Mitchell,2002 and Sloggett & Joshi, 1998). Attempts to quantifythe relative effects of context and composition havegenerally concluded that compositional effects aredominant, while there is disagreement about whethercontextual factors have a significant effect. Of course,results of empirical analyses on this subject may dependon the measures of health used and the way in whichneighbourhoods are defined.

How should neighbourhoods be defined?

Neighbourhoods can be defined in many differentways, but there are several problems in defining a setof neighbourhoods consistently across a whole country.Often the definition of neighbourhoods is determined bypractical considerations, such as the assumption thata ward (or a census tract in the USA) is a sensible oper-ational definition of a neighbourhood. This is conve-nient because data are available at the ward level, butthere is little reason to expect neighbourhoods to followward boundaries, and even less reason to expect diseaseto respect these same administrative boundaries. Expe-rience of attempting to construct a set of neighbour-hoods in Scotland (Flowerdew, Feng, & Manley,2007) showed that identification of ‘neighbourhoods’throughout Scotland was problematic. For example, inrural areas a village with 200 inhabitants may bestrongly resistant to being regarded as part of thesame neighbourhood as the next village, while in bigcities the term may be used for an area with 10,000people.

Another problem is that the names of certain neigh-bourhoods acquire strong connotations. If these conno-tations are positive, people may construct theneighbourhood to include their home, while denyingthat they live in a stigmatized neighbourhood (Clark,2004). It is also the case that some neighbourhoodshave a strong and clear identity while areas in betweendo not; this is particularly so where residential propin-quity has a diminishing effect on friendship patterns,which may be much more closely linked to sharedworkplaces and leisure-time activities.

Even where neighbourhoods have clear identities,research in behavioural geography has shown that indi-vidual people may have very different ideas of wherethe neighbourhood boundary is located (Lee, 1976;Wong, 2006). Exceptions may occur where there areclear natural or man-made boundaries such as rivers,industrial estates or railway lines, or sometimes wherea neighbourhood organization has explicitly defined

its area of interest, but there is often little consensusas to how boundaries should run in areas of continuoushousing development.

The British government has recently taken an initia-tive which is intended to promote local democracy bypublishing an array of data at the neighbourhood level.It is clear from the guidelines that these should bebigger than census output areas, but smaller than wards.These were conceived by government as neighbour-hoods, a concept which is generally constructed ashighly positive. However, attempts to define a networkof neighbourhoods across the country ran into the prob-lems described above, and the resulting areas weregiven the more neutral name of ‘Super Output Areas’(England and Wales) or ‘Data Zones’ (Scotland). Insome cases, these units may approximate neighbour-hoods as understood by the local residents, but gener-ally they are designed to cover the country with unitsof roughly similar size, reasonably compact shape,and a degree of social homogeneity. There is no expec-tation that they would be suitable for testing the ideasdiscussed above about contextual effects on health.

The conclusion from this discussion is that neigh-bourhoods can be defined in multiple ways. If there isa contextual effect on health, it is not reasonable toexpect that it will be identifiable regardless of how theneighbourhood is defined. Dividing a study area intoneighbourhood zones may be done in many ways,some of which might approximate the geography ofthe contextual effect, whether identified in terms ofenvironmental, cultural or health care factors. Otherways of dividing the study area might cut across thesegeographies and be very inefficient in replicating thecontextual effect. The assumption that ward boundariesdefine the appropriate neighbourhoods may greatly un-derestimate the true contextual effect, as highlighted byManley, Flowerdew, and Steel (2006). Accordingly, thispaper constitutes an investigation of the stability of theneighbourhood effect across a range of different sets ofareal units to see how consistent and stable the resultsare.

The modifiable areal unit problem

If we want to determine whether there is a neighbour-hood effect, how should ‘neighbourhood’ be defined? Itcan be defined in multiple ways e but they will notnecessarily give the same answers. This is an exampleof the modifiable areal unit problem (MAUP) e a classicproblem in statistical analysis of geographical data.

The MAUP was first identified by Gehlke and Biehl(1934). Its essence is that analytical results for the same

Page 4: Neighbourhood effects on health: Does it matter where you draw the boundaries?

1244 R. Flowerdew et al. / Social Science & Medicine 66 (2008) 1241e1255

data in the same study area can be different, in somecases wildly different, if aggregated in different ways.The MAUP affects many statistics, including variancesand standard deviations, correlation and regressionanalysis.

The MAUP is often described as having two aspects ethe scale effect and the zonation effect. According to thescale effect, there may be major analytical differencesdepending on the size of units used. Usually, but notalways, correlations are more pronounced (whether pos-itive or negative) for bigger units. The zonation effect,sometimes called the aggregation effect (Openshaw,1984), shows that there may be major differences de-pending on how the study area is divided up, even atthe same scale.

Openshaw and Taylor (1979) found that correlationsin Iowa between Republican voting and percentage ofold people could vary from �0.97 to þ0.99 dependingon how counties were aggregated. However, they usedshapes that had convoluted boundaries and a high vari-ability in size. There is much less variation for morerealistic zonal schemes, but there are examples whereit matters (e.g. Heasman, Kemp, Urquhart, & Black,1986 on childhood leukaemia near Dounreay nuclearpower station; Boyle & Alvanides, 2004 on EuropeanUnion funding qualifications; Rezaeian, Dunn, StLeger, & Appleby, 2006 on correlates of suicide).Work has been published by Fotheringham and Wong(1991), Openshaw (1984), Tranmer and Steel (2001)among others, on how and why the MAUP exists, andwhat can be done about it.

In an attempt to evaluate the extent of the problem,Manley (2006) investigated the number of times therewere statistically significant differences between corre-lations calculated at the ward level and the EnumerationDistrict (ED) level for pairs of census variables. Hisstudy shows statistically significant differences inalmost all cases.

Criteria for designing zones

In the 1991 British census data used in this project,the smallest data unit is the Enumeration District, andhence EDs form the basic building block for designingalternative zoning systems. Because previous literaturehas assumed that wards are an appropriate size for iden-tifying contextual effects, this paper makes a similar as-sumption, but tests a variety of different zonal systemsat the same scale. In each case, EDs are aggregated toform ‘pseudo-wards’, and the analysis is concernedwith how the relationship of health to compositionaland contextual variables varies between different sets

of pseudo-wards. In terms of the MAUP, therefore, theemphasis is on the zonation effect rather than the scaleeffect.

There are many different ways of grouping the EDsinto pseudo-wards, and the analysis was focused onwhat could be regarded as reasonably sensible ways.Unlike Openshaw and Rao (1995), we were not con-cerned to illustrate how large a variation in correlationand regression results could be achieved, rather tolook at the variation occurring in reasonably realisticsets of zones. Accordingly, suitable criteria for zonedefinition are discussed.

The first criterion used was that pseudo-wardsshould be internally contiguous, in other words thateach ED in a pseudo-ward should be accessible to everyother ED within that pseudo-ward without leaving thepseudo-ward, and that no pseudo-ward should havedetached portions. This requirement is problematic forareas which contain islands, which is why the studyareas used were entirely inland. It would be possibleto include islands in an analysis of this type by definingan island ED as contiguous to the nearest mainland EDor to the port from which the island was usuallyreached, but this was unnecessary for either Northamp-tonshire or Swindon. Similar problems occur whenareas are divided by river channels. We note howeverthat rivers tend to form natural boundaries within anarea, and there is thus good reason to suggest that neigh-bourhoods would be unlikely to extend across a riverchannel.

A second criterion used in the construction ofpseudo-wards was that ‘doughnut’ shapes should beavoided, in other words that no pseudo-ward shouldbe entirely surrounded by another. The case for thisrestriction is not indisputable, indeed it might be sensi-ble to construct separate pseudo-wards to represent anurban area and a surrounding doughnut-shaped ruralarea, on the grounds that there might be greater internalhomogeneity than there would be in a system with twozones both including a mixture of rural and urban EDs.

Another factor which might be considered importantin deriving appropriate zonal systems is the number ofzones to be created. As it was intended to comparethe sets of pseudo-zones generated with the ward geog-raphy, the desired number of zones was the same as, orclose to, the number of wards in the study area.

It was also considered important for pseudo-wardpopulations to be of roughly the same size. Although,for historical and other reasons, actual zonal systemsdo not always have this characteristic (think of thediversity in population of the Canadian provinces orthe American states), such a system seems both more

Page 5: Neighbourhood effects on health: Does it matter where you draw the boundaries?

1245R. Flowerdew et al. / Social Science & Medicine 66 (2008) 1241e1255

egalitarian and more efficient than one whose compo-nents vary greatly in population size.

Shape was also a major concern. Neighbourhoodsare likely to be relatively compact in shape, with partialexceptions along a coastal strip or valley. Pseudo-wardswith long ‘peninsulas’ and narrow ‘isthmuses’ seemless realistic than ones with more regular shapes(though the process of electoral redistricting in theUnited States sometimes produces very irregularzones). In fact, pseudo-ward boundaries may necessar-ily have irregular shapes because the boundaries of thecomponent EDs are themselves irregular. Nevertheless,regularity of shape was considered an important attri-bute for a zonal system.

Internal homogeneity could also be regarded as animportant feature of pseudo-wards. If a contextualeffect exists, it may well be strongest in more homoge-neous neighbourhoods. However, there may be manyrespects in which pseudo-wards may be more or lesshomogeneous. It is not clear which would be the bestvariable or variables on which to assess homogeneityin our attempts to identify contextual effects on health.Good candidates, all available from the census, mightinclude housing tenure, as used in the England andWales 2001 Census Output Areas (see Martin, Nolan,& Tranmer, 2001), employment characteristics, agestructure and ethnicity.

Zone design software

Investigation of the MAUP has been greatly aided inthe recent past by the development of software whichcan be used in zone design (see Alvanides, Openshaw,& Rees, 2002). Relevant packages include SAGE(Wise, Haining, & Ma, 1997), ZDES (Alvanides, Open-shaw, & Macgill, 2001) and A2Z (Daras & Alvanides,2005). In this project, we used the AZTool system (for-merly AZM), originally developed by Martin et al. forthe construction of Output Areas for the 2001 census(Martin et al., 2001). AZTool allows the constructionof zonal systems which can incorporate all the criteriadiscussed above.

The package requires two input files, correspondingto ‘.pat’ and ‘.aat’ files used in the ArcInfo geographicalinformation system. The ‘.pat’ file includes the popula-tion and other variables for the EDs and the ‘.aat’ filecontains information on which pairs of EDs are contig-uous to each other. Users can specify a minimum pop-ulation threshold which all pseudo-zones should reach,or a population target which pseudo-ward populationsshould approach as closely as possible, subject to theother constraints in the model. The population target

controls the approximate number of pseudo-zones gen-erated. Users can also specify whether ‘doughnuts’ arepermitted, whether shape should be considered, andwhether homogeneity should be considered and, if so,how. The user also specifies the relative weighting oftarget population, shape and homogeneity in evaluatingzonal systems as they are produced.

The shape measure used in AZTool is defined as theperimeter of the pseudo-zone (squared) divided by itsarea, summed for each pseudo-zone. Homogeneitycan be defined on the basis of the number of people orhouseholds in each of several categories. For example,homogeneity in housing tenure can be evaluated interms of numbers who are home owners, private sectorrenters and social renters (more or fewer categories canbe used). Its extent is calculated using intra-area corre-lation (IAC). It is possible to create the zonal system us-ing homogeneity on more than one set of categoryvariables, the user being able to set the relative weights.

The program starts by choosing random numbers (sothat each replication is different) to identify initial seedEDs, to which the other EDs are allocated. The overallzonal system is then scored on the basis of, first, prox-imity of each pseudo-ward population to the target pop-ulation, second, the shape measure and, third, the IAC.Then, EDs are selected and reallocated to a differentpseudo-ward (contiguous to the ED) and the score isrecalculated. If the score is improved, the ED remainsin its new pseudo-ward; if not, it reverts to its originalpseudo-ward. This process continues until no furtherimprovements are recorded, and the score is recordedfor that iteration. To reduce the likelihood of gettingstuck at a local optimum, the user can opt to use themethod of simulated annealing (Openshaw & Open-shaw, 1997). Then a new set of seed EDs is selected,and a new iteration begins. A user-selected number ofiterations is performed, and the best score overall isfound. Details of the allocation of EDs to pseudo-wardsare output and can be used for mapping or statisticalanalysis. For any given weighting of the input criteria,it is possible to generate a large number of differentzonal systems. For a large spatial system, very few ofthem are likely to be duplicates.

Estimating the neighbourhood effect in Britishcensus data

This study uses data from the 1991 British census tosee whether there is an identifiable neighbourhoodeffect on people’s health and, if so, to measure howbig the effect is. The 1991 data were chosen becausethe alternative 2001 boundaries were expressly

Page 6: Neighbourhood effects on health: Does it matter where you draw the boundaries?

1246 R. Flowerdew et al. / Social Science & Medicine 66 (2008) 1241e1255

designed to homogenise neighbourhood effects. This isspecifically the effect under investigation in this paperand thus the 2001 data would be inappropriate to adoptfor the study. The data refer to self-reported health, andare based on answers to the question on limiting long-term illness (LLTI), defined as any ‘long-term illness,health problem, or handicap which limits the daily ac-tivities or the work that a person can do’. In 1991, about12% of residents in households in Great Britain saidthey suffered from such a condition (Charlton et al.,1994). We acknowledge that any measure of self-reportedhealth, such as LLTI, may be subject to some impreci-sion due to reporting biases. However, self-reportedmeasures are widely used as indicators of populationhealth in the UK, and they are correlated with otherstandard health indicators such as mortality and lifeexpectancy.

The choice of study area within Great Britain wasbased on constraints set by the zone design software ethe analysis to be described here was carried out for thecounty of Northamptonshire in the English Midlands(population just over 500,000) and for the district ofSwindon (known at the time as Thamesdown) in central

Fig. 1. (i) Limiting long-term illness 1991, Northamptonshire, ward level,

southern England (population 170,000). Both areascontain a mix of urban and rural environments witha reasonable mix of different social classes.

Data from the 1991 Census were made available atseveral spatial scales, of which the two most relevantare the two smallest, the ward, with a population ofa few thousand, and the Enumeration District (ED),with a population of a few hundred. Northamptonshirehas 1268 EDs with an average population of 440,grouped into 148 wards with an average population of3816. Wards are used as electoral units in local govern-ment but EDs generally have no role other than ascensus reporting units. Fig. 1(i) shows the distributionof LLTI at the ward level in Northamptonshire, andFig. 1(ii) shows the distribution of LLTI at the ED level.Swindon has 350 EDs, average population 488, groupedinto 21 wards with an average population of 8136.Fig. 2(i) and (ii) are, respectively, ward-level and ED-level maps of the distribution of LLTI in Swindon.

If we analyze data at the ED scale, we can identifycorrelates of LLTI as compositional effects, but anyward-level effect can be identified as a ‘neighbourhoodeffect’. For the analysis presented in this paper we use

and (ii) limiting long-term illness 1991, Northamptonshire, ED level.

Page 7: Neighbourhood effects on health: Does it matter where you draw the boundaries?

Fig. 2. (i) Limiting long-term illness 1991, Swindon, ward level, and (ii) limiting long-term illness 1991, Swindon, ED level.

1247R. Flowerdew et al. / Social Science & Medicine 66 (2008) 1241e1255

the term ‘neighbourhood effect’ to describe an ecolog-ical relationship at the ward geographic scale, that ispresent after we have controlled for ecological relation-ships at the lower geographical scale of the ED. Thisanalysis structure was chosen because of an explicitinterest in neighbourhood effects, which leads to mod-elling at the area level (after Cockings & Martin, 2005).The nature of the exercise in this study is to explore thepresence or otherwise of variable spatial structures, andnot work within the rigid levels that a multi-level modelcan impose. Accordingly, an Ordinary Least Squareslinear regression model was constructed for Northamp-tonshire at the ED level, in which the percentage of peo-ple with LLTI (pcllti) was regressed on the percentageof people of pensionable age (pcpens), the percentageof non-whites (pcnonw), and the percentage of male un-employment (pcunem). Because of the wide divergencein ED size, the regression was weighted by population.This gave the following result:

pcllti¼ 1:136þ 0:385pcpensþ 0:058pcnonw

þ 0:262pcunem ð1Þ

The R2 value was 0.733. All explanatory variableswere statistically significant.

Similarly, pcllti in Swindon was regressed on thepercentage of white people (pcwhite), the percentageof children younger than 5 years (pckids), the percent-age of people of pensionable age (pcpens), the percent-age of male unemployment (pcunem), the percentage inowner-occupied housing (pcoo) and the percentage inhousing rented from the local authority (pcla). Again,the regression was weighted by population. Here, thebest model was

pcllti¼ 9:391� 0:075pcwhiteþ 0:374pcpens

þ 0:363pcunemþ 0:179pcla ð2Þ

The R2 value was 0.847; the variables pckids andpcoo did not have significant regression coefficients.

The next phase of the method was to calculatea neighbourhood variable. This was done by calculatingthe ward-level value of pcllti and inserting it as an addi-tional explanatory variable into the ED-level model. InNorthamptonshire, the ward-level value was used

Page 8: Neighbourhood effects on health: Does it matter where you draw the boundaries?

Table 1

The experimental conditions

Population

threshold

Population

target

Population

weight

Shape

weight

Homogeneity

A 2651 8136 1 1 d

B 2651 8136 10 1 d

C 2651 8136 1 10 dD 2651 8136 1 1 Employment

E 2651 8136 1 1 Tenure

1248 R. Flowerdew et al. / Social Science & Medicine 66 (2008) 1241e1255

without alteration (pclltiwd) but in Swindon a value wascomputed for the ward excluding the ED in question(pclltinbd) in order to test whether this affects theresults. This difference does not seem to be important:in Swindon the two measures have a correlation of0.994. The best model for Northamptonshire (againweighting by ED population) was

pcllti¼�0:465þ 0:351pcpensþ 0:037pcnonw

þ 0:228pcunemþ 0:225pclltiward ð3Þ

The R2 value increased from 0.733 to 0.747, a smallbut statistically significant increase. All explanatoryvariables were significant. The conclusion from thisanalysis is that there is a significant ‘neighbourhoodeffect’.

In Swindon, however, incorporating the neighbour-hood variable had less impact:

pcllti¼ 9:225� 0:078pcwhiteþ 0:368pcpens

þ 0:298pcunemþ 0:179pcla

þ 0:067pclltinbd ð4Þ

The R2 value however remains at 0.847, and thepclltinbd variable has a standard error of 0.049, andhence is not statistically significant. The conclusionfrom this analysis is that there is no evidence for a neigh-bourhood or contextual effect in Swindon. On the basisof these results, one might conclude that there is a signif-icant neighbourhood effect in Northamptonshire but notin Swindon. However, are these results affected by howward boundaries are drawn? The remainder of the paperexplores this question.

Modelling the impact of neighbourhood definition

In order to evaluate the impact of neighbourhoodcontext on health, sets of pseudo-wards were created,using different combinations of criteria. This wasdone using AZTool. The results are evaluated first ofall in terms of the properties of the zonal systems cre-ated, then by the correlations observed between thevariables considered, and finally by the size of theneighbourhood effect identified.

Sets of pseudo-wards were generated for Swindonusing various combinations of the user-defined parame-ters in AZTool, namely population threshold, popula-tion target, shape and homogeneity. It was intended tocreate a zonal system which would be similar to theward system in terms of the number of zones, whichwas 21, with an average population of 8136. All zonalsystems were produced with a population target of

8136 and a population threshold of 2651, the populationof the smallest ward in the district. Ten replicationswere created for each of five experimental conditions:in all cases, simulated annealing was used. The aggre-gation process was run for 50 iterations in each case,with the final AZTools output being chosen for theaggregation outcome. The experimental conditions forward boundary systems labelled AeE were as shownin Table 1.

Properties of the zonal systems

Although the use of the population target of 8136 isintended to ensure that the number of pseudo-wards iscomparable to the number of real wards, there is noguarantee that there will be exactly 21 pseudo-wards.In fact, there were 18 cases (out of 50) where the num-ber was 21, there were 17 cases out of 20 pseudo-wards;the minimum being 16 and the maximum being 22.Condition C tended to have lower numbers than theother conditions.

The zonal systems can be evaluated according to thestatistics used in developing the system. These are thepopulation target (PT), the perimeter squared overarea measure of shape (P2A) and the intra-area correla-tion (IAC) to measure homogeneity. These represent thecriteria of equality of size, regularity of shape, and inter-nal homogeneity.

The statistics used are as follows:

PT¼Xðpk � p�Þ ð5Þ

where pk is the population of zone k and p* is the targetpopulation;

P2A¼X

q2k=Ak ð6Þ

where qk is the perimeter of zone k and Ak is its area. Inorder to calculate IAC, it is useful to start by calculatingdk which is the contribution of category k (e.g. tenure orage categories) to the IAC:

Page 9: Neighbourhood effects on health: Does it matter where you draw the boundaries?

1249R. Flowerdew et al. / Social Science & Medicine 66 (2008) 1241e1255

vk ¼1

M�1

PMg¼1 Nk

�Pgk�Pk

�2

ðN ��1ÞPg

�1�Pg

� � 1

ðN ��1Þð7Þ

where M is the number of areal units, Nk is the numberof people in category k, N� is the mean population size(slightly adjusted; see Tranmer & Steel, 1998), Pgk isthe proportion of people in zone g and category k, Pg

is the proportion in zone g, and Pk is the proportion ofpeople in category k. The d values are then combinedas follows to yield the IAC value:

IAC¼ 1

k� 1

XK

k¼1

ð1�PkÞvk: ð8Þ

Table 2 shows how the sets of pseudo-wards com-pare with the first two of these indicators. It can beseen that condition C (where shape is emphasised) be-haves very differently from the others, with very highPT values and very low P2A values. An increase inthe PT value will indicate a zonal system that hasa higher degree of variability in the population size ofthe zones. Thus, lower PT values are desirable to ensurethat the zones are of comparable size. Conversely, lowP2A values show that the zones are relatively compact,longer boundaries and smaller areas suggest highly sin-uous areas, which are undesirable for most purposes.Homogeneity is only used in conditions D (unemploy-ment) and E (tenure). In both cases, there are no majordeclines in PT compared with condition B (emphasisingpopulation equality), and an improved shape measurecompared to A and B.

Figs. 3e6 provide examples of the different zonationsystems produced through the aggregation process.They clearly show the effect of the emphasis on shapein experimental condition C. Otherwise, they show thewell known result that the composition of the zones ina choropleth map can lead to maps that look very differ-ent and give very different impressions of a spatial dis-tribution. Compare Figs. 3 and 4, which show jaggedboundaries between irregularly shaped zones, withFigs. 5 and 6, which show more compact and more

Table 2

Real ward PTand P2Awith best and worst population target and shape

measure statistics for five experimental conditions, Swindon, 1991

Best PT Worst PT Best P2A Worst P2A

Real 43,169 e 830.7 eA 5,049,721 19,875,682 1178.6 1501.8

B 9,679,461 22,192,863 1248.1 1807.9

C 65,978,299 203,775,243 657.2 807.2

D 8,679,809 26,893,147 1040.6 1418.6

E 8,567,193 29,220,093 1010.4 1369.1

regularly bounded shapes. There are also differencesbetween Figs. 4 and 5, which seem to suggest an impor-tant EasteWest division, and Figs. 3 and 6, whichappear to suggest an extension of high LLTI areas tothe north-east (but at a slightly different angle).Remember that all four figures are based on exactlythe same ED-level data, aggregated in different ways.

For each set of pseudo-wards, correlation coeffi-cients were calculated between pairs of the variablesused in the study (see Table 3). In the real ward dataset (bolded at the top of the table), pcllti has stronglysignificant positive relationships with pcunem, pclaand pcpens, an insignificant negative correlation withpcwhite, and small positive correlations with pckidsand pcoo. We also examined the correlation of thetwo variables most strongly correlated with pcllti,namely pcunem and pcla, which was also stronglysignificant.

The scale effect can be assessed by comparing thecorrelations of pcllti with the other variables betweenward-level and ED-level data. Two of the variables(pckids and pcpens) have weaker correlations at theward level and the other variables have stronger corre-lations. Two (pcwhite and pckids) change sign, andthe three most highly correlated with pcllti change theirorder of importance. The ED-level correlation of pclltiand pcpens is 0.822 compared to 0.711 at the ward level,while the correlations of pcllti with pcunem and pcla,respectively, drop from 0.844 to 0.402 and from 0.736to 0.697. It appears from these results that the age groupeffects on limiting long-term illness are best identifiedat the ED level, whereas unemployment and to a lesserextent local authority housing have effects more clearlyshown for larger areas.

Investigation of the zonation effect can proceed bycomparing results from the 50 sets of pseudo-wards.The correlation coefficients between pcllti and pcwhiterange from þ0.077 to �0.165; it is noteworthy that thereal ward system has a bigger negative correlation(�0.201) than any of the 50 pseudo-ward systems.There is no obvious difference between the five exper-imental conditions.

The correlation of pcllti and pckids was þ0.023,indicating a generally statistically insignificant relation-ship. However, all 50 sets of pseudo-wards have a nega-tive relationship, reaching statistical significance at the0.05 level in 10 cases, with a maximum negative valueof �0.562 and a minimum negative value of �0.186.Again, the real ward system appears very unusual inthe correlation results.

The correlation of pcllti and pcpens, not surprisingly,is strongly positive. It ranges from 0.774 to 0.910.

Page 10: Neighbourhood effects on health: Does it matter where you draw the boundaries?

Figs. 3e6. Limiting long-term illness 1991, Swindon, pseudo-ward level, replication A4 (Fig. 3), replication B3 (Fig. 4), replication C3 (Fig. 5),

replication C4 (Fig. 6).

1250 R. Flowerdew et al. / Social Science & Medicine 66 (2008) 1241e1255

Page 11: Neighbourhood effects on health: Does it matter where you draw the boundaries?

Table 3

Correlations between variables, pseudo-ward systems with comparison to the real ward and ED results emphasised at the top of the table (*signif. at

0.05; **signif. at 0.01)

Replication pcllti/pcwhite pcllti/pckids pcllti/pcpens pcllti/pcunem pcllti/pcoo pcllti/pcla pcunem/pcla

Real wards L0.201 0.023 0.711** 0.844** 0.193 0.736** 0.802**

Real EDs 0.057 L0.309 0.822** 0.402** 0.093 0.697** 0.560**

A1 �0.129 �0.413 0.847** 0.847** 0.509* 0.681** 0.780**

A2 0.028 �0.258 0.807** 0.769** 0.450* 0.643** 0.718**

A3 0.052 �0.375 0.774** 0.846** 0.222 0.692** 0.757**

A4 �0.045 �0.345 0.830** 0.788** 0.447* 0.674** 0.800**

A5 �0.118 �0.412 0.896** 0.811** 0.442 0.744** 0.707**

A6 0.076 �0.502* 0.785** 0.855** 0.482* 0.566** 0.663**

A7 �0.083 �0.360 0.884** 0.881** 0.536* 0.744** 0.704**

A8 �0.133 �0.300 0.823** 0.893** 0.348 0.675** 0.804**

A9 0.008 �0.449* 0.850** 0.805** 0.414 0.701** 0.744**

A10 �0.044 �0.335 0.908** 0.835** 0.468* 0.760** 0.785**

B1 0.000 �0.397 0.823** 0.800** 0.494* 0.671** 0.753**

B2 �0.109 �0.531* 0.856** 0.806** 0.453* 0.705** 0.703**

B3 �0.109 �0.245 0.817** 0.817** 0.351* 0.681** 0.825**

B4 0.023 �0.419 0.825** 0.812** 0.426 0.678** 0.785**

B5 �0.165 �0.402 0.865** 0.874** 0.575** 0.649** 0.692**

B6 0.016 �0.261 0.865** 0.816** 0.406 0.708** 0.773**

B7 �0.028 �0.391 0.843** 0.818** 0.359 0.692** 0.722**

B8 0.052 �0.325 0.778** 0.841** 0.286 0.704** 0.767**

B9 �0.055 �0.361 0.872** 0.870** 0.409 0.679** 0.743**

B10 �0.061 �0.381 0.878** 0.859** 0.497* 0.730** 0.748**

C1 �0.039 �0.329 0.820** 0.886** 0.364 0.735** 0.796**

C2 �0.025 �0.438 0.806** 0.791** 0.391 0.637** 0.730**

C3 0.042 �0.497* 0.890** 0.863** 0.415 0.769** 0.748**

C4 0.014 �0.482 0.874** 0.818** 0.644** 0.588* 0.578*

C5 0.003 �0.186 0.814** 0.828** 0.285 0.651** 0.820**

C6 0.042 �0.386 0.792** 0.853** 0.365 0.627** 0.789**

C7 �0.101 �0.317 0.875** 0.861** 0.360 0.733** 0.796**

C8 0.059 �0.562* 0.865** 0.746** 0.523* 0.535* 0.726**

C9 �0.062 �0.340 0.833** 0.876** 0.223 0.783** 0.782**

C10 �0.066 �0.391 0.862** 0.773** 0.546* 0.659** 0.771**

D1 0.044 �0.495* 0.910** 0.862** 0.507* 0.728** 0.690**

D2 0.008 �0.370 0.783** 0.782** 0.369 0.640** 0.754**

D3 �0.070 �0.449* 0.894** 0.822** 0.567** 0.711** 0.722**

D4 �0.106 �0.304 0.820** 0.852** 0.515* 0.663** 0.807**

D5 �0.151 �0.443 0.842** 0.767** 0.465* 0.642** 0.710**

D6 �0.063 �0.364 0.849** 0.873** 0.457* 0.738** 0.793**

D7 0.075 �0.505* 0.822** 0.783** 0.555* 0.591** 0.589**

D8 �0.153 �0.416 0.864** 0.851** 0.472* 0.642** 0.725**

D9 0.077 �0.410 0.866** 0.867** 0.501* 0.745** 0.788**

D10 �0.078 �0.379 0.828** 0.847** 0.409 0.741** 0.819**

E1 �0.106 �0.315 0.818** 0.879** 0.316 0.748** 0.830**

E2 �0.051 �0.444* 0.850** 0.769** 0.485* 0.648** 0.754**

E3 0.007 �0.428 0.813** 0.789** 0.365 0.696** 0.681**

E4 �0.027 �0.440 0.819** 0.807** 0.361 0.678** 0.759**

E5 0.013 �0.548* 0.888** 0.714** 0.857** 0.538* 0.646**

E6 0.003 �0.310 0.780** 0.829** 0.240 0.672** 0.738**

E7 0.048 �0.380 0.854** 0.838** 0.310 0.705** 0.821**

E8 �0.098 �0.381 0.855** 0.866** 0.379 0.761** 0.794**

E9 0.002 �0.279 0.822** 0.810** 0.283 0.696** 0.680**

E10 0.038 �0.264 0.778** 0.805** 0.267 0.675** 0.789**

1251R. Flowerdew et al. / Social Science & Medicine 66 (2008) 1241e1255

Page 12: Neighbourhood effects on health: Does it matter where you draw the boundaries?

1252 R. Flowerdew et al. / Social Science & Medicine 66 (2008) 1241e1255

Again, the real ward system is outside this range(0.711), suggesting that the ward boundaries do notaccord with areas of concentration of older people.

The correlation of pcllti and pcunem is also stronglypositive and always statistically significant at the 0.01level. Correlations vary from 0.714 to 0.893. Althoughsome sets of pseudo-wards (D1eD10) were based inpart on homogeneity with respect to unemployment,there does not appear to be any systematic tendencyfor these systems to have higher or lower correlationsthan the others.

The correlation of pcllti and pcoo is positive, rangingfrom 0.222 to 0.857. Roughly, half the replications werestatistically significant at the 0.05 level, and in fourcases at the 0.01 level. Again, the real ward correlationis lower at 0.193 than any of the 50 replications. Onereplication (E5) is particularly unusual, having easilythe highest value for this correlation, and unusuallylow values for the correlations of pcllti with pcunemand pcla. With the exception of E5, the other replica-tions in series E tend to have lower values. These aggre-gations are the ones which are based on homogeneitywith respect to housing tenure.

The correlation of pcllti and pcla is also stronglypositive, ranging from 0.535 to 0.783. All cases butthree are significant at the 0.01 level, and those threeare significant at the 0.05 level. This time the realward system is well within these bounds. There seemsto be no specific effect from the use of housing tenureas a criterion for grouping (condition E).

Further analysis of the pseudo-ward data set took theform of multiple regression analysis using weightedleast squares. The variable pcllti was the response vari-able and pcwhite, pckids, pcpens, pcunem, pcoo andpcla were used as explanatory variables. Separatemodels were fitted to each of the 50 sets of pseudo-wards. This was done at the ED level, with the neigh-bourhood effect variable pclltinbd used as an additionalexplanatory variable. Again, there is interest in how thecoefficients of the explanatory variables vary betweenreplications, but the primary concern is whether thereis a significant neighbourhood effect, and how depen-dent this is on the set of areal units employed. Becausethe EDs are the same in all replications, all the variablesare identical except pclltinbd, whose value depends onwhich EDs are in the same pseudo-ward.

The results of the regression analyses are shown inTable 4 with the real Wards and EDs results bolded atthe top of the table. It is perhaps reassuring that thereis not a great deal of variation between the replications.The same variables are significant in almost all cases,with fairly similar coefficients. The neighbourhood

effect pclltinbd is significant (and positive) in the major-ity of cases (31 out of 50).

The 19 cases where the neighbourhood effect is notsignificant are found in all five experimental conditions,being particularly frequent in condition B. The regres-sion analyses where pclltinbd is not significant havea R2 statistic of 0.847; while those in which pclltinbdis significant are only marginally higher at around0.850 (the best model, E3, has a R2 value of 0.854).

The constant term varies from 7.639 to 9.391; thecoefficient for pcwhite varies from �0.067 to �0.084and the coefficient for pcpens from 0.354 to 0.384.The regression coefficients for pcunem vary from0.247 to 0.363, for pcla from 0.082 to 0.186, and forpclltinbd from 0.101 to 0.186. None of these valuesare dramatically altered, though there are sufficient dif-ferences to give subtly different interpretations.

Conclusion e it does matter where you draw theboundaries!

The analysis reported above showed that for North-amptonshire there was a small but significant neigh-bourhood effect, but for Swindon there was no sucheffect. However, the experiments conducted withzone design showed that this result is dependent onthe way the ward boundaries are drawn. Given that theward boundaries are essentially arbitrary as far as thedistribution of limiting long-term illness is concerned,the results suggest that the initial conclusion, that thereis no neighbourhood effect, is not statistically safe. For31 out of 50 zonal systems covering the Swindon re-gion, there is a significant neighbourhood effect. Theactual ward system is not typical of the other (at leastas reasonable) possible zonal configurations. From theanalysis it appears that, in many cases, the pseudo-wards created here give better definitions of the neigh-bourhood than the published set of wards. In Swindonat least, therefore, it does matter where you draw theboundaries.

The analysis has demonstrated once again the exis-tence of the modifiable areal unit problem. It showsthat, for sets of pseudo-wards that make sense in termsof population equality and shape, the zonation effect isreal. Although the scale effect tends to be larger andmore dramatic, the results show that the zonation effectcan be sufficiently large to affect the result in a substan-tively important context, the existence of a neighbour-hood effect on self-reported illness. Previous experience(Flowerdew et al., 2007) has shown that the conceptof a neighbourhood is contentious, and the local knowl-edge can be important in creating the boundary

Page 13: Neighbourhood effects on health: Does it matter where you draw the boundaries?

Table 4

Regression analysis results for the 50 pseudo-ward systems with comparison to the real ward and ED results emphasised at the top of the table

Replication Constant pcwhite pcpens pcunem pcla pclltiwd s.e. R2

Real wards L24.289 0.266 0.263 1.645 d d d 0.958

Real EDs 9.225 L0.078 0.368 0.298 0.179 0.067 0.049 0.847

A1 8.233 �0.073 0.359 0.258 0.185 0.146 0.030 0.850

A2 9.263 �0.084 0.364 0.276 0.182 0.135 0.051 0.850

A3 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

A4 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

A5 8.187 �0.070 0.361 0.299 0.179 0.111 0.047 0.849

A6 9.313 �0.083 0.366 0.292 0.183 0.118 0.055 0.849

A7 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

A8 8.710 �0.075 0.364 0.269 0.180 0.114 0.049 0.849

A9 8.910 �0.079 0.365 0.295 0.181 0.115 0.052 0.849

A10 8.557 �0.076 0.360 0.289 0.178 0.140 0.046 0.851

B1 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

B2 7.639 �0.067 0.361 0.283 0.181 0.140 0.049 0.850

B3 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

B4 8.930 �0.078 0.363 0.288 0.182 0.115 0.049 0.849

B5 8.386 �0.073 0.361 0.280 0.183 0.120 0.050 0.849

B6 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

B7 8.972 �0.079 0.359 0.266 0.182 0.134 0.048 0.850

B8 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

B9 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

B10 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

C1 8.720 �0.078 0.363 0.256 0.182 0.138 0.051 0.850

C2 8.915 �0.077 0.364 0.295 0.182 0.101 0.051 0.850

C3 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

C4 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

C5 9.309 �0.082 0.363 0.272 0.183 0.113 0.050 0.849

C6 9.187 �0.081 0.363 0.274 0.182 0.115 0.050 0.849

C7 8.458 �0.075 0.361 0.269 0.181 0.138 0.049 0.850

C8 9.018 �0.080 0.362 0.291 0.184 0.117 0.052 0.849

C9 8.915 �0.080 0.358 0.269 0.184 0.140 0.049 0.850

C10 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

D1 9.076 �0.081 0.360 0.272 0.181 0.134 0.048 0.850

D2 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

D3 8.662 �0.074 0.362 0.292 0.082 0.102 0.049 0.849

D4 8.521 �0.075 0.363 0.271 0.183 0.127 0.051 0.849

D5 8.068 �0.072 0.359 0.262 0.185 0.152 0.050 0.851

D6 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

D7 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

D8 8.368 �0.075 0.383 0.272 0.183 0.140 0.062 0.850

D9 9.765 �0.091 0.355 0.217 0.186 0.181 0.048 0.853

D10 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

E1 8.556 �0.074 0.366 0.282 0.179 0.114 0.050 0.849

E2 8.567 �0.074 0.362 0.307 0.179 0.103 0.047 0.849

E3 8.923 �0.083 0.354 0.237 0.184 0.186 0.046 0.854

E4 8.629 �0.077 0.360 0.274 0.179 0.140 0.047 0.851

E5 9.076 �0.080 0.363 0.295 0.185 0.116 0.054 0.849

E6 9.136 �0.081 0.364 0.280 0.178 0.126 0.048 0.850

E7 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

E8 8.484 �0.073 0.364 0.296 0.178 0.105 0.047 0.849

E9 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

E10 9.391 �0.075 0.374 0.363 0.178 Insignificant 0.847

1253R. Flowerdew et al. / Social Science & Medicine 66 (2008) 1241e1255

definitions. The concept of local ground truthing is notconsidered practical at a national scale, but would berecommended for applications searching to definea specific neighbourhood.

The existence of good software for zone design, suchas the AZTool system used in this study, makes itpossible to assess the size of the modifiable areal unitproblem by experimenting with different ways of

Page 14: Neighbourhood effects on health: Does it matter where you draw the boundaries?

1254 R. Flowerdew et al. / Social Science & Medicine 66 (2008) 1241e1255

aggregating the data. It is now possible to run zonedesign software on regular desktop computers, withoutrecourse to heavy-duty large scale computing as in thepast. Indeed, iterations used in this study were com-pleted within minutes. Thus, it is now realistic to followwhat may be good practice and experiment with newaggregations before making a final decision about thenature of any relationship. If there is a relationshipbetween two variables, the nature of the zonal systemmay either emphasise or obscure it. If there is a relation-ship, it would be unfortunate if an inappropriate set ofboundaries prevents it being detected.

The pseudo-wards used in the zone design workwere based on five different sets of criteria. The resultsindicate that the zonal systems emphasising the shapecriterion are considerably different from the othersets. Zonal systems using other sets of criteria remainfairly similar. In particular, there seems to be little effectfrom the use of homogeneity criteria.

A further point that emerges from the results is thatthe actual ward system in some respects seems out ofline with the pseudo-wards. This may reflect the waythe real ward system has developed, but does suggestthat the use of the published ward system may give par-ticularly misleading results compared to pseudo-wardsystems which better embody the population equalityand shape criteria used in the zone design.

These results have major implications for social sci-entists studying health with the aid of data for arealunits, and indeed for anybody using ecological data.The following guidelines are proposed for the use ofaggregated data. The zonal system used should not betaken for granted, especially when the topic under dis-cussion is related to spatial processes, as in the caseof the neighbourhood effect on health. It is importantto consider whether the chosen zonal system is thebest way of representing the processes creating thedata. The ability to experiment with different ways ofaggregating the same data represents an effective wayto investigate these issues in terms of both the size effectand the zonation effect. It is not practical, nor recom-mended, that analysts using aggregated spatial datashould explore the full gamut of zonal systems priorto statistical analysis, despite the existence of goodzonation software. However, what this paper does sup-port is the notion that the geography should be consid-ered prior to analysis.

There are three general techniques that we suggestcould be used to aid the analyst in seeking out more re-liable statistics. Firstly, previous work (Manley et al.,2006) has highlighted that the processes that may leadto neighbourhood effects are not always consistent

with the boundaries as drawn. When this inconsistencyexists, there may be a case for examining alternativezonal geographies, to produce better defined zonations.Using the ‘wrong’ zonal system may mask real effectsor identify spurious ones. Secondly, even a small num-ber of reaggregations can highlight statistical instabil-ity, as shown here. Where this occurs, we suggest thatanalysts should proceed more cautiously with claimsof significance, or not, in results. Thirdly, and finally,work in preparation (Manley, Steel, & Flowerdew,2007) demonstrates a methodology of assessing thelikely stability of results, such as the correlation coeffi-cients, using a technique known as the spatial jackknife.From this, a summary statistic is produced which re-views the variation around an estimate. Together, thesethree techniques can assist the analysts in developinga more complete understanding of the system withinwhich they are working. Perhaps, the variation in resultsfor different aggregations should encourage us to treatthe MAUP not as a problem but as a phenomenon,helping us to understand the location and scale of thegeographically located processes generating the data.

Acknowledgments

Robin Flowerdew acknowledges receipt of anErskine Visiting Fellowship at the Department of Geog-raphy, University of Canterbury, Christchurch, NewZealand. David Martin (University of Southampton)gave us permission to use his zoning program AZTool.Jamie Pearce (University of Canterbury) provided use-ful insights and helped with data manipulation. Thecensus data used in this paper are Crown Copyrightand were bought by ESRC and JISC for use by theacademic community. We also used digitized boundarydata purchased by ESRC for the academic community.Access was obtained via the MIMAS system at theManchester Computing Centre. An earlier version ofthis paper was presented at the Chicago meeting ofthe Association of American Geographers.

References

Alvanides, S., Openshaw, S., & Macgill, J. (2001). Zone design as

a spatial analysis tool. In N. J. Tate, & P. M. Atkinson (Eds.),

Modelling scale in geographical information science (pp.

141e157). New York: Wiley.

Alvanides, S., Openshaw, S., & Rees, P. (2002). Designing your own

geographies. In P. Rees, D. Martin, & P. Williamson (Eds.), The

census data system (pp. 47e65). Chichester: Wiley.

Boyle, P. J., & Alvanides, S. (2004). Assessing deprivation in English

inner city areas: making the case for EC funding for Leeds City.

In G. Clarke, & J. Stillwell (Eds.), Applied GIS and spatial

analysis (pp. 111e136). New York: Wiley.

Page 15: Neighbourhood effects on health: Does it matter where you draw the boundaries?

1255R. Flowerdew et al. / Social Science & Medicine 66 (2008) 1241e1255

Boyle, P. J., Gatrell, A. C., & Duke-Williams, O. (2004). Limiting

long-term illness and locality deprivation in England and Wales:

acknowledging the ‘socio-spatial context’. In P. J. Boyle,

S. E. Curtis, E. F. Graham, & E. G. Moore (Eds.), The geography

of health inequalities in the developed world: Views from Britain

and North America (pp. 293e308). Aldershot: Ashgate.

Charlton, J., Wallace, M., & White, I. (1994). Long-term Illness:

results from the 1991 Census. Population Trends, 75, 18e25.

Cheshire, P. (2007). Segregated neighbourhoods and mixed commu-

nities: A critical analysis. Joseph Rowntree Foundation.

Clark, A. J. (2004). Wish you were here? Experiences of moving

through stigmatised neighbourhoods in urban Scotland. Unpub-

lished PhD thesis, School of Geography and Geosciences,

University of St Andrews.

Cockings, S., & Martin, D. (2005). Zone design for environmental

and health studies using pre-aggregated data. Social Science

and Medicine, 60, 2729e2742.

Daras, K., & Alvanides, S. (2005). Zone design in public health

policy. In M. Campagna (Ed.), GIS for sustainable development

(pp. 247e267). CRC.

Diez-Roux, A. V. (2003). The examination of neighbourhood effects

on health: conceptual and methodological issues related to the

presence of multiple levels of organisation. In I. Kawachi, &

L. Berkman (Eds.), Neighbourhoods and health (pp. 45e64).

New York: Oxford University Press.

Dorling, D., & Thomas, B. (2004). People and places: A 2001 census

atlas of the UK. Bristol: Policy Press.

Flowerdew, R., Feng, Z., & Manley, D. J. (2007). Constructing data

zones for Scottish Neighbourhood Statistics. Computers,Environment and Urban Systems, 31, 76e90.

Fotheringham, A. S. (1999). Guest editorial: local modelling.

Geographical and Environmental Modelling, 3, 5e7.

Fotheringham, A. S., & Wong, D. W. S. (1991). The modifiable areal

unit problem in multivariate statistical analysis. Environment and

Planning A, 23, 1025e1044.

Gatrell, A. (2002). Geographies of health. Oxford, UK: Blackwell.

Gehlke, C. E., & Biehl, K. (1934). Certain effects of grouping upon the

size of the correlation in census tract material. Journal of the Amer-

ican Statistical Association, 29(Special Supplement), 169e170.

Heasman, M. A., Kemp, I. W., Urquhart, J. D., & Black, R. (1986).

Childhood leukaemia in northern Scotland. Lancet, 1986(i), 266.

Kawachi, I., & Berkman, L. (2003). Introduction. In I. Kawachi, &

L. Berkman (Eds.), Neighborhoods and health. New York:

Oxford University Press.

Lee, T. R. (1976). Psychology and the environment. London:

Methuen.

Macintyre, S., & Ellaway, A. (2003). Neighbourhoods and health: an

overview. In I. Kawachi, & L. Berkman (Eds.), Neighborhoodsand health. New York: Oxford University Press.

Macintyre, S., MacIver, S., & Sooman, A. (1993). Area, class and

health: should we be focusing on places or people? Journal ofSocial Policy, 22, 213e234.

Manley, D., Flowerdew, R., & Steel, D. (2006). Scales, levels and

processes: studying spatial patterns of British census variables.

Computers, Environment and Urban Systems, 30(2), 143e160.

Manley, D. J. (2006). The modifiable areal unit phenomenon: an

investigation into the scale effect using UK census data.

Unpublished PhD thesis, School of Geography and Geosciences,

University of St Andrews.

Manley, D., Steel, D., & Flowerdew, R. (2007). Why we should still

worry about the Modifiable Areal Unit Problem. British Society

of Population Studies, Annual Conference, St Andrews, 11e13

September 2007.

Martin, D., Nolan, A., & Tranmer, M. (2001). The application of

zone design methodology to the 2001 UK census. Environment

and Planning A, 33, 1949e1962.

McCulloch, A. (2001). Ward-level deprivation and individual social

and economic outcomes in the British Household Panel Study.

Environment and Planning A, 33, 667e684.

Meade, M., & Earickson, R. (2000). Medical geography. New York:

Guilford Press.

Openshaw, S. (1984)The modifiable areal unit problem (concepts and

techniques in modern geography), 38. Norwich: GeoBooks.

Openshaw, S., & Openshaw, C. (1997). Artificial intelligence in

geography. Chichester: Wiley.

Openshaw, S., & Rao, L. (1995). Algorithms for reengineering 1991

census geography. Environment and Planning A, 27, 425e446.

Openshaw, S., & Taylor, P. J. (1979). A million or so correlation

coefficients: three experiments on the modifiable areal unit prob-

lem. In N. Wrigley (Ed.), Statistical applications in the spatial

sciences (pp. 127e144). London: Pion.

Pickett, K. E., & Pearl, M. (2001). Multilevel analyses of neighbour-

hood socioeconomic context and health outcomes: a critical

review. Journal of Epidemiology and Community Health, 55,

111e122.

Putnam, R. (2000). Bowling alone: The collapse and revival of Amer-ican community. New York: Simon and Schuster.

Rezaeian, M., Dunn, G., St Leger, S., & Appleby, L. (2006). Ecolog-

ical association between suicide rates and indices of deprivation

in the north west region of England: the importance of the size of

the administrative unit. Journal of Epidemiology and Community

Health, 60, 956e961.

Shaw, M., Dorling, D., & Mitchell, R. (2002). Health, place andsociety. Bristol: Policy Press.

Sloggett, A., & Joshi, H. (1998). Deprivation indicators as predictors

of life events 1981e1992 based on the UK ONS longitudinal

study. Journal of Epidemiology and Community Health, 52,

228e233.

Tranmer, M., & Steel, D. G. (1998). Using census data to investigate

the causes of the ecological fallacy. Environment and Planning A,

30, 817e831.

Tranmer, M., & Steel, D. G. (2001). Using local census data to inves-

tigate scale effects. In N. J. Tate, & P. M. Atkinson (Eds.), Mod-

elling scale in geographical information science (pp. 105e122).

New York: Wiley.

Wilkinson, R. (1996). Unhealthy societies. London: Routledge.

Wise, S. M., Haining, R. P., & Ma, J. (1997). Regionalization tools

for the exploratory spatial analysis of health data. In

M. Fischer, & A. Getis (Eds.), Recent developments in spatial

analysis: Spatial statistics, behavioural modelling and neuro-

computing (pp. 83e100). London: Springer-Verlag.

Wong, P. (2006). Representing suburbs as vague spatial entities:

a Web-based PPGIS approach. Unpublished MSc thesis, Depart-

ment of Geography, University of Auckland, New Zealand.