64
Determining and increasing the sensitivity of existing environmental surveillance monitoring networks to detect unanticipated effects that may occur in the environment in response to the cultivation of genetically modified crops. CB0304: Final Report Executive Summary Environmental Surveillance Networks (ESNs) provide long term (>20 years) time series of counts at multiple sites for many taxa in the UK, and estimates of changing abundance of individual species can act as indicators for changes in biodiversity and ecosystems more broadly. Such changes have acquired significant policy-making influence. Counts are usually undertaken by volunteer surveyors, under the leadership of NGOs specialising in the various groups. Differences in available resources and species’ ecology mean that field protocols vary, but a common underlying, broad design is a site x year ‘matrix’ of annual counts (or series of counts within each year) obtained at a large number of sites. This consistency in design has also led to a degree of convergence in analytical techniques, namely Poisson-based Generalized Linear Models with multi-level factors representing spatial and temporal variation between the counts. In addition to the taxa specific ESNs that concentrate on species counts, there are ESNs that collect data on multiple biophysical measures. For schemes collecting such varying data, common models and protocols are difficult; however with a suitably flexible modeling framework, such as Generalised Linear Mixed Models, comparisons can be made. Such datasets, often more spatially intensive than temporally intensive, offer an opportunity to examine patterns of stock and change across multiple metrics. The potential of existing schemes to assess the ecological impact of changing agricultural regimes or management practices is therefore considerable. We investigate the statistical power of some of the most frequently adopted models to detect changes over time, and/or between sites differing in some respect, using data from ESNs. Power can depend upon many factors, such as the scale and duration of the survey, the abundance of the organism and magnitude of its population change (and 1

Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Determining and increasing the sensitivity of existing environmental surveillance monitoring networks to detect unanticipated effects that

may occur in the environment in response to the cultivation of genetically modified crops.

CB0304: Final Report

Executive Summary

Environmental Surveillance Networks (ESNs) provide long term (>20 years) time series of counts at multiple sites for many taxa in the UK, and estimates of changing abundance of individual species can act as indicators for changes in biodiversity and ecosystems more broadly. Such changes have acquired significant policy-making influence. Counts are usually undertaken by volunteer surveyors, under the leadership of NGOs specialising in the various groups. Differences in available resources and species’ ecology mean that field protocols vary, but a common underlying, broad design is a site x year ‘matrix’ of annual counts (or series of counts within each year) obtained at a large number of sites. This consistency in design has also led to a degree of convergence in analytical techniques, namely Poisson-based Generalized Linear Models with multi-level factors representing spatial and temporal variation between the counts. In addition to the taxa specific ESNs that concentrate on species counts, there are ESNs that collect data on multiple biophysical measures. For schemes collecting such varying data, common models and protocols are difficult; however with a suitably flexible modeling framework, such as Generalised Linear Mixed Models, comparisons can be made. Such datasets, often more spatially intensive than temporally intensive, offer an opportunity to examine patterns of stock and change across multiple metrics. The potential of existing schemes to assess the ecological impact of changing agricultural regimes or management practices is therefore considerable.

We investigate the statistical power of some of the most frequently adopted models to detect changes over time, and/or between sites differing in some respect, using data from ESNs. Power can depend upon many factors, such as the scale and duration of the survey, the abundance of the organism and magnitude of its population change (and spatial variation in these), the influence of stochastic variation in the data available and the inevitable ‘turnover’ rate of sites. We develop a single, simple linear model to estimate the power to detect change over time inherent in a commonly-used Poisson model as a function of these factors. A similar model was examined to look specifically at the power to detect a difference in two spatially explicit regimes, which is more appropriate for ESNs focused on spatial rather than temporal intensity. The outputs of these models are explored under a range of scenarios. Our examples here focus on the surveillance of GM crops, and vary the degree of uptake of GM crop varieties and the type of GM crop; these factors influence the degree to which the agricultural sites in

1

Page 2: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

question are likely to be covered by existing ESNs. We also explore the influence of stochastic variation, magnitude of change and spatial scale at which the growing of GM crops is reported. We illustrate how these models can also be used to estimate the number of sites required to achieve a certain level of power – so providing guidance on how extensions of existing networks would provide additional power and, in turn, informing judgments as to the cost-effectiveness of such extensions.

Finally, we explore two alternative approaches. First of all we use the (count) linear model we have derived above to explore the power of the Breeding Bird Survey (BBS) and the Butterfly Monitoring Scheme (BMS) to detect change between two classes of site over a number of years, and a second model to detect spatial change between two regimes within one year (CS). Secondly we use data from the monitoring of long-term, national scale impacts of management changes introduced under Environmental Stewardship (ES) on birds monitored by the BBS. This second approach is an example where a management change (ES) has had an impact on populations of species of conservation concern. Under the assumption that changes due to GM crops may be of similar magnitude, we apply the same scenarios to explore how these influence the power of detecting these changes. Our alternative approaches represent an illustration and potential validation (through cross-referencing with real data simulations) of the application of aspects of the generic tool and a proof-of-concept with respect to the feasibility of detection of real effect sizes.

The conclusion is that the paired design (comparison of GM and conventional fields of each crop) applied to existing ESN data does provide a means of detecting change within the agricultural environment. It is also possible to explore the limitations of this approach: if change is small, indicator species relatively rare (in terms of number of sites at which they are recorded) or if it is necessary to detect change within a few years, the power of this approach is likely to be low. However, with well chosen indicator species (widespread and sensitive to the management of the farmed environment) and an appropriate ESN (with sufficient monitoring sites in arable land), the probability of detecting change is much higher, especially after five or more years.

2

Page 3: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

EXECUTIVE SUMMARY................................................................................................................................................ 1

BACKGROUND............................................................................................................................................................ 4

OBJECTIVES................................................................................................................................................................ 5

IDENTIFICATION OF INDICATOR SPECIES, AGRICULTURAL, ECOLOGICAL AND POLICY SCENARIOS TO UNDERLIE SIMULATIONS (OBJ 1)......................................................................................................................................... 6

BUTTERFLIES AS INDICATORS OF BIODIVERSITY IN AN AGRO-ECOSYSTEM...........................................................................................6

COUNTRYSIDE SURVEY INDICATORS..........................................................................................................................................6

BBS BIRD INDICATORS...........................................................................................................................................................8

SCENARIO DEFINITION.........................................................................................................................................................10

ANALYSES OF TIME SERIES........................................................................................................................................ 11

DERIVATION OF AN EMPIRICAL MODEL FOR POWER CALCULATION IN TIME SERIES MODELS OF ANIMAL COUNTS (OBJS 2 & 3).................11

EXPLORING THE UTILITY OF EXISTING ESNS USING THE ‘GENERIC EQUATION’. (OBJ 4)...............................................25

UK BUTTERFLY MONITORING SCHEME...................................................................................................................................25

BREEDING BIRD SURVEY (BBS).............................................................................................................................................26

ANALYSES OF SPATIAL DATA WITHIN YEARS USING COUNTRYSIDE SURVEY..............................................................29

METHODS.........................................................................................................................................................................29

UNIFIED MODEL.................................................................................................................................................................38

IMPACTS OF ENVIRONMENTAL STEWARDSHIP MEASURES AS A PROXY FOR GM CROP IMPACTS. (OBJ 4)..................40

METHODS......................................................................................................................................................................40

RESULTS.........................................................................................................................................................................43

IMPLICATIONS OF INCREASING POWER THROUGH INCREASING SAMPLE SIZE...........................................................47

COSTS FOR INCREASED BUTTERFLY SAMPLING UNDER WCBS......................................................................................................47

INCREASING THE COUNTRYSIDE SURVEY SAMPLE.......................................................................................................................48

COSTS FOR INCREASED BIRD SAMPLING UNDER BBS..................................................................................................................49

CONCLUSIONS.......................................................................................................................................................... 51

THE GENERIC EQUATION FOR COUNT DATA............................................................................................................................51

ANALYSIS OF SPATIAL DATA WITHIN YEARS...............................................................................................................................51

USING ENVIRONMENTAL STEWARDSHIP MEASURES AS A PROXY FOR GM CROPS............................................................................52

LIMITATIONS OF GENERAL SURVEILLANCE................................................................................................................................52

REFERENCES............................................................................................................................................................. 54

3

Page 4: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

BackgroundLegislation requires that prior to commercial marketing in Europe, a genetically modified organism (GMO) must undergo an environmental risk assessment, and, if authorized, a post-market environmental monitoring (PMEM) plan is put in place. Part of this PMEM is General Surveillance (GS) to detect any unintended effects of the GMO. Pre-existing Environmental Surveillance Networks (ESN's) are expected to play a key role in GS.

This has brought to the fore the need for policy makers to understand the ability of existing ESN's in terms of their sensitivity and therefore potential for detecting changes in the environment that may be correlated to the cultivation of GM crops and has also highlighted the broader applicability of this approach to detect unintended impacts of change in farm management practices more generally. If greater power to detect change is required, policy makers would need to understand the degree to which extending existing ESNs would increase the power of an ESN, the assumptions upon which any such predicted increase in power would be based and any uncertainties in the predictions that the latter suggest.

There is also a need to provide clear guidance to applicants for licenses for GM crops as they prepare their PMEM plans, to enable them to identify which data they will use from ESNs, how they will analyse these data, and the strengths and limitations of such an analysis.

Power analyses, using Monte Carlo simulation techniques, estimate the power to detect change of specified magnitude in given circumstances. This practice is relatively straightforward theoretically, but computationally demanding, and moreover the results are applicable only under the assumptions made in simulating the artificial data used. These assumptions are important because the data need to be produced to mimic real survey results from many hundreds of sites, based on plausible GM crop uptake for each site and plausible species-specific responses, both of which may vary from case to case. Power can also depend upon other factors specific to the organization of the survey scheme, such as the scale and duration of the survey, the abundance of the organism and magnitude of its population change (and spatial variation in these) and the inevitable ‘turnover’ rate of sites. The “true” values of such factors in future surveys clearly cannot be predicted accurately in advance, hence it is essential to consider the power consequent upon a range of such assumptions, rather than a single value, often plotted graphically as a ‘power curve’, or to explore existing or previous survey data to predict (ranges of) plausible values.

We develop here a linear equation to predict the power of detecting change between two treatments by monitoring an indicator species or environmental metric. We explore how power is influenced by a range of GM crop uptake scenarios, how existing ESNs may perform in detecting change under these scenarios, and how effective enhancing these networks would be in improving the power to detect change.

4

Page 5: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Objectives1) To identify policy scenarios, indicator species and a plausible range of ecological and

agricultural scenarios (extent of ‘take-up’) to underlie the simulations upon which a model is to be based.

2) To perform a large number of analyses of data simulated within the range ascertained above.

3) To use the simulation results above to produce and test a simple approximation of ‘power’, in terms of the various factors by which it is determined, in the form of a linear model that can be used to estimate power in a range of circumstances without the need for additional extensive simulation work.

4) To illustrate its use via examples from current ESN bird and butterfly data.

5) Prepare report/manuscript with a view to publication in scientific literature.

5

Page 6: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Identification of indicator species, agricultural, ecological and policy scenarios to underlie simulations (Obj 1)

Butterflies as indicators of biodiversity in an agro-ecosystemWhile arable land is in general a species-poor environment for butterflies, there are species which can be found there in good numbers. There are no ‘specialist’ farmland species analogous to those in the Farmland Bird Index (see below), but generalist species such as the Whites, some common Brown species and the Small Tortoiseshell utilise such land, not because it is a preferred habitat but because they alone are able to survive there, possibly in rough patches or amongst nettles where hedges have been removed. As power to detect change depends upon species’ abundance, useful indicator species are those present in some number, and likely to react most to any change in habitat or land-use. Of these, different species may therefore function as optimal ‘indicator’ species, according to the nature of agricultural change: where management results in a loss of rough ground the consequences are most likely to be quickly detected in species like Small Tortoiseshell whose larvae are dependent upon nettles. On the other hand, changes in growing practice of, for example, oilseed rape may more quickly manifest themselves through changes in Large and Small white whose larvae feed, amongst other things, on this crop.

For the purposes of this report, the Small Tortoise shell, Aglais urticae and the Large White, Pieris brassicae were chosen as indicator species, due to their presence and abundance on agricultural land.

Countryside Survey Indicators

Countryside survey measures data on many metrics that are potential indicators of environmental health. In particular CS data on habitat connectivity and plant richness in the wider countryside is used in the ‘UK Biodiversity Indicators in your Pocket 2012’. Biodiversity is a clear indicator of farmland health and as such biodiversity metrics of plant species richness and nectar plant species richness provide good indicators of environmental status of farmland. Delivery of key ecosystem services provides a suitable criterion by which to measure farmland health and determine appropriate indicators. As such, cover of arable weeds is one possible metric as not only do they provide biodiversity (cultural/aesthetic) value in their own right but they can provide a vital food source for birds and insects. Some species of common weed that meet these criteria, are fairly abundant and so are potentially good indicators, are listed in table 1.

6

Page 7: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Soils also play a crucial role in delivery of ecosystem services, including carbon sequestration, water quality and nutrient cycling. Soil chemistry and nutrient status are therefore key metrics to ascertain the status of any parcel of land. Soils can also determine many above ground measures and indicators of potential change.

A full list of potential metrics to use as indicators of farmland health is given in table 1. Common weeds of Cirsium arvense (creeping thistle), Galium aparine (goosegrass) or Poa annua (annual meadow grass) were chosen as they are considered to be species indicative of farmland environmental quality, and, for the purposes of this report, also have different means and variances in abundance. We also explored three measures of soil quality (carbon, nitrogen and pH), and one of water quality. As an indicator for water quality the Average (BMWP – Biological Monitoring Working Party) Score per Taxon (ASPT) was used. This is deemed the most appropriate and most sensitive single metric for describing water quality.

7

Page 8: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Table 1: Potential indicators of farmland health

Metric Measured in CS2007

Related to specific arable fields

Total Plant species richness √ √

Nectar plant richness √ √

Cover of arable weeds

Capsella bursa-pastoris √ √Cerastium fontanum √ √Cirsium arvense √ √Dactylis glomerata √ √Elymus repens, √ √Elytrigia repens √ √Galium aparine √ √Holcus lanatus √ √Persicaria maculosa √ √Poa annua √ √Polygonum aviculare √ √Rumex obtusifolius √ √Stellaria media √ √Taraxacum officinale agg. √ √Trifolium repens √ √Urtica dioica √ √

Soil invertebrate richness √ √

Ellenberg fertility √ √

Soil chemistry

Nitrogen √ √Carbon √ √Phosphorus √ √pH √ √

Soil Water-holding capacity X √

Water quality:  Average (Biological Monitoring

Working Party) Score per Taxon (ASPT) √ X

BBS bird indicatorsIndividual bird species use the cropped environment in one or more of three broad ways (i) nesting in the crop, (ii) feeding in the cropped area in summer and (iii) feeding in the cropped area in winter (perhaps after the crop has been harvested and/or the field ploughed). The extent to which each is important to a given species will determine its potential sensitivity to the changes to the environment likely to be caused by a switch to GM and also, therefore, the value

8

Page 9: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

of its abundance as an indicator of GM crop effects. Note, however, that it would be overly simplistic to consider how many of the three ways are relevant to a species as an index of its sensitivity because the latter will be determined by where population-limiting factors are found and on the size of the biological effect in each case. Some species utilize the cropped environment in all three ways, but rarely in the same crop type or field. In general, among small passerines, the evidence suggests that winter foraging is where most species are limited and they will be most vulnerable, whereas it is factors associated with nesting or summer foraging that are most relevant to species with precocial young.

The geographical ranges of each species are important for several reasons. First, clearly, species can only inform about a given crop’s impact on the environment when their distribution overlaps with that of the crop. Second, as well as this spatial limit, species with restricted ranges tend not to be found sufficiently frequently on BBS squares for sample sizes to be sufficiently large to allow statistical detection of relationships with environmental variables (although, conversely, they may also be the most sensitive to change). Third, species move between seasons to different extents. Thus, fully migratory species have no relationship with UK farmland in winter and, in partial migrants (species where a proportion of the UK breeding population winters overseas), the same is true for some of the population, weakening any relationships between winter habitat factors and breeding numbers. To a lesser extent, the same applies to species whose winter populations are augmented by immigrants from further north, because the UK breeders might conceivably be disproportionately affected or unaffected by a given habitat change because of factors such as differential habitat selection or relative social dominance affecting their vulnerability.

Finally, potential indicator species are presented individually in Table 2. Considering the abundance of individual species is the most sensitive way in which to use the information for General Surveillance. Data from these species could potentially be combined in many different ways, such as diversity indices, guild-level abundance or average annual abundance indices (as in the national Farmland Bird Index). However, while combining data, in an appropriate way, for species with similar life histories or habitat or food requirements might increase sensitivity, it is also very likely that some combinations of species will tend to obscure the responses of particularly vulnerable species because other species do not share the specific factors that generate the vulnerability. Combining species for surveillance in this context also incorporates an implicit hypothesis that the factors that the species concerned have in common are those that define sensitivity and response to GM crops (but see1).

1 http://www.rivm.nl/dsresource?objectid=rivmp:118436&type=org&disposition=inline

9

Page 10: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Table 2: Potential indicator species, defined as those that use the cropped areas of arable fields in winter or summer (shown by ticks). Ticks in parentheses show cases for which in-field habitats are minor for the species, even within arable regions. Note that some species also have significant proportions of their UK populations in non-arable habitats (e.g. lapwing, woodpigeon), so to maximize their potential as indicators of the health of the cropped environment, the arable portions of each species’ population should be extracted from national data sets where possible. Species with “restricted” ranges are those likely to be too uncommon or range-restricted to be useful indicators, even if their populations are monitored nationally using the BBS.

Species Summer Winter Breeding range Wintering rangeFeeding Nesting

Skylark UK UK (supplemented by immigrants)

Dunnock () () () UK UKYellowhammer UK UKLinnet UK Partial/short-distance migrant;

60% winter outside UKGrey Partridge UK UKLapwing UK UKSong Thrush UK UKReed Bunting () () UK UKChaffinch UK UK (supplemented by

immigrants)Goldfinch () UK Partial/short-distance migrant;

60% winter outside UKGreenfinch UK UKWoodpigeon UK UKStock Dove UK UKYellow Wagtail Restricted AfricaCorn Bunting Restricted RestrictedWhitethroat () () UK AfricaTree Sparrow Restricted RestrictedHouse Sparrow () () UK UK

Scenario definitionIt was decided that the scenarios could be defined fairly simply by a) crop; b) uptake rate; and c) effect size that General Surveillance would need to detect. Crops were chosen to match those in the regulatory pipeline which may be grown in the UK, namely potatoes, sugar beet and maize. Uptake rates were varied between 20% and 80% (20, 40, 60, 80), and effect sizes were varied between 5% and 50% (5, 10, 25, 50).

10

Page 11: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Analyses of Time Series

Derivation of an empirical model for power calculation in time series models of animal counts (Objs 2 & 3)

Background

There are now many national wildlife surveys in which the abundances of species in various taxonomic groups are counted, according to some standardised field protocol, at a number of different sites. Though the precise detail behind the data collection may vary according to the ecology of the taxa, we might consider a generic data set as represented by counts Ci,t, taken at site i in year t of the survey. Traditionally, these have often been modelled as Poisson random variables with expected value µi,t .

A model for time-series of count data

Freeman and Newson (2008) developed a model in which population growth, defined as the ratio between two consecutive expected counts at a site, was modelled as a linear function of a site- and time-specific covariate Pi,t :

(1) log ( μ i ,t

μi , t−1)=R t+α Pi , t−1

Thus Rt is the rate of growth at a site where Pi,t-1 = 0, and α quantifies the effect upon growth of the covariate, so that if Pi,t is a simple binary variable (0/1), growth where Pi,t-1 = 1 is reduced (if α<0) to a fraction eα of that at a site where Pi,t-1 = 0, otherwise e is the effect of unit change in a continuous variable Pi,t.

Freeman and Newson (2008) considered the presence/absence of a competing species as their covariate of interest, but the model clearly can be applied to a wider range of circumstances. Here we consider Pi,t, again a binary variable, to distinguish between sites operating one of two management regimes, and taking the value 1 (‘treated’) or 0 (‘control’).

After some algebra, we have:

(2) log ( μi , t )=∑j=1

t−1

R j+α∑j=1

t−1

Pi , j+ log (μi , 1)

and the model is linear in the unknown parameters R, α and the μI,1, which can therefore easily be estimated using a Generalized Linear Model (Freeman and Newson, 2008).

A power analysis

Our objective is to characterise the power of the model defined in equation (2) to identify a non-zero value of α (that is, a difference in trend over time between sites operating under the two regimes). Clearly this power may depend upon any of a number of factors in practice, e.g. the

11

Page 12: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

scale and duration of the survey and the abundance and range of the animals themselves. We seek here to develop a model to predict power from nine of these key properties, initially as follows:

(3) g (θ )=a0+∑i=1

9

a i X i

where a0 – a9 are unknown parameters, θ (0<θ<1) is the power (expressed as the probability of rejecting the hypothesis that α=0) and g is an appropriate link function constraining θ to lie in the valid range (0,1). This basic model is then readily extended to incorporate interactions between any of the Xi.

An adjustment for lack of fit in the Poisson model

Power calculations could be based upon the assumption, following Freeman and Newson (2008) that the data are adequately described by a Poisson distribution. In many cases this may not be so. Model fitting is then generally extended by using the Pearson Chi-squared statistic or residual deviance to modify standard errors and test statistics, the ‘quasi-Poisson’ option in many statistical packages. If the fit of a Poisson model is poor, power calculations based upon such an assumption are likely to be too high. Note therefore that data overdispersed with respect to a Poisson distribution have been generated for these analyses, and the fit of the models adjusted accordingly. In summary, our nine predictor variables X1 – X9 are defined as in Table 3:

12

Page 13: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Table 3: Definition and range of nine predictor variables

Variable Range considered Definition

X1 : slope, R. -0.1 to -0.1 The average annual rate of change at an untreated site, Rt (= R, assumed constant here)

X2 : Nsites. 50 to 200 The number of sites visited2

X3 : treated. 0.01 to 0.9 The proportion of sites treated (e.g. the proportion of crops grown that are GM)

X4:abundance_mean. 0 to 5 The average log(abundance) at a site in year one of the survey

X5 : abundance_var. 0.1 to 4 The variance of the log(abundance) measures at each site in year one of the survey; a measure of inter-site variability

X6 : missed. 0.2 to 0.5 The proportion of survey visits missed

X7 : duration 5 to 20 The duration of the survey

X8 : q 1 to 10 Scale parameter: A measure of the excess residual deviance, or overdispersion in the data

X9 : alpha, . -0.1 to 0 (i.e. annual reductions of up to 9.6%)

The magnitude of the difference between the two treatments (in this case GM and conventional crops)

To illustrate the simulation method, consider for example a survey of five years’ duration (thus fixing X7) in which the consequences of adopting some ‘treatment’ at certain random sites is represented by α = -0.1 that is growth at a ‘treated’ site is reduced to e -0.1 = 90.4% per year of that at an untreated control (fixing X9). Estimation of θ by Monte Carlo simulation then proceeds in two stages:

i) Select the dimensions of an artificial survey by generating values of X2, X3 and X6 at random, from the following uniform distributions: X2~Unif(50,200), X3~Unif(0.1,0.9) and X6~Unif(0.2,0.5).

2 We consider X2 and X3 in the model by replacing them with two transformed variables representing the

numbers of treatment and control sites (i.e. No. Sites x proportion treated and No.. sites x (1-proportion

treated) respectively) since power can be expected to increase monotonically with these.

13

Page 14: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

ii) Select parameters reflecting the range and demography of an artificial population by generating values X1, X4 and X5 at random from the following distributions: X1~Unif(-0.1,-0.1), X4~Unif(0,5) and X5~Unif(0.1,4). Note, to put the latter two ranges in perspective, that two of the commonest species in the UK Butterfly Monitoring Scheme, Small Tortoiseshell and Large White, in the latest year available had average log-counts of 3.5 and 2.6 respectively, and variances of 3.5 and 5.7 across the few arable sites currently in the scheme.

iii) Generate X2 initial expected abundances μi,1, one for each site, by generating a series of independent and identically distributed variables from a normal distribution: log(µi,1) ~N(X4,X5), i=1,2,…X2.

iv) Generate expected abundances µi,t, t=2,3,4,5 for each site as the survey progresses, using equation (2).

v) Generate a series of artificial counts C i,t, where E(Ci,t) = µi,t, and a randomly selected scale parameter X8 is used to produce overdispersion with respect to a Poisson distribution. Fit model (2) to the data arising and, adjusting for overdispersion, test the hypothesis of α=0.

vi) Return to (iii) and repeat, generating 2000 such data sets; count those returning a significant value of α.

vii) Select new values of X7 and X9 at random from the range considered, return to (i) and generate new values X1-X6, repeat the entire process to produce a count of significant results from 2000 replicated simulations based on the new survey dimensions and demographic parameters.

This entire process was repeated 1000 times, yielding 1000 estimates Si of the number of significant results based upon 1000 scenarios defined by the values X1i-X6i, i=1,2,… 1000.

Power θ is then estimated by assuming the Si are binomially distributed, Si ~ Binom(2000,θ) with θ related to X1-X9 via equation (3), and interactions considered to improve the predictive ability of this model.

The advantage of the method is that, once the parameters in (3) (or an extension of it – see later) are estimated, they can be used quickly to derive an approximation of the power for any chosen combination of survey scale / demographic variables X, and the relationship between power and any of these covariates is easily explored, without the need to repeat the computer-intensive simulations every time.

14

Page 15: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Results

A simple, linear combination of the nine candidate predictor variables (Equation 3) proves to be a rather poor predictor of the observed (simulated) power values (Figure 1). Although cases in which the simulations produced high power were similarly identified by the model, power in a substantial proportion of those in which the observed value was low was overestimated by the model. The same linear model based upon most of the same predictors performed well when applied to simulations derived from a single, constant value of α. We concluded a more complex approach is needed to span a range of values of .

The same model performs considerably better when fitted to data over a smaller range of values for α; two are shown for illustration in Figures 2 & 3, where the model is fitted only to data where -0.01 < α < 0 or Figure 2 and -0.1 < α <-0.09 in Figure 3.

15

Figure 1: Estimated values of power from a 9 parameter univariate model (y) versus power estimated by repeated simulations (x).

Figure 2: As Figure 1, but model fitted to cases with a reduced range for α: -0.01 < α < 0

Page 16: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Note that in Figure 3, where the effect of a treatment is largest, results are concentrated into the top right hand corner, where power is almost one.

Coefficients of this model fitted to the data in ten subsets, obtained by splitting the range of α into ten equal segments, are shown in Table 4. Variation in these as α is changed is considerable, and often erratic, implying potential for interactions between these variables and α.

Table 4: Estimated coefficients a0 – a9 from the fitting of equation (3) to the simulated power estimates for α in ranges of bin-width 0.01, centred on mid-points indicated in left-hand column.

α(mid-point) a0 a1 a2 a3 a4 a5 a6 a7 a8 a9

-0.005 -7.79 5.27 0.01 0.01 0.57 1.00 0.15 -0.66 -287.17 -0.11

-0.015 -8.61 9.26 0.01 0.00 0.95 1.28 0.23 -0.95 -149.40 -0.25

-0.025 -7.07 3.42 0.02 0.01 1.00 1.42 0.32 -3.91 -24.61 -0.19

-0.035 -3.29 6.68 0.03 0.00 1.02 1.31 0.25 -1.79 42.58 -0.30

-0.045 -10.06 2.43 0.04 0.01 1.06 1.20 0.28 -3.35 -81.18 -0.14

-0.055 -13.52 3.33 0.03 0.00 1.26 1.29 0.33 -4.74 -164.66 -0.25

-0.065 -19.62 6.79 0.06 0.01 1.73 1.65 0.34 -3.03 -187.10 -0.37

-0.075 -5.82 -0.38 0.04 0.00 0.92 0.70 0.30 0.34 -11.26 -0.18

-0.085 -6.09 2.64 0.05 0.02 1.50 1.75 0.31 -0.35 18.38 -0.24

-0.095 -10.05 9.33 0.05 0.01 1.27 1.30 0.33 -1.26 -45.95 -0.19

16

Figure 3 As Figure 1, but model fitted to cases with a reduced range for α: -0.1 < α < -0.09

Page 17: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

We therefore next extended the model by adding first-order interactions between α and each of the other predictors, effectively forcing the combined coefficients of the latter to vary (smoothly) with α. Though a formal test shows the fit to be greatly improved (AIC arising from Figure 4 being 225,252 as opposed to 311,526 for Figure 1), resulting improvement in the models capacity to predict the power estimated from the simulations is modest (Figure 4).

We then examined models with the addition of an

extra interaction between each of the (7 × 8) /2 =28 possible pairs of predictors, other than α, computed the AIC of each and identified the pair responsible for the greatest improvement in formal fit, over that of Figure 4. By some distance, this was the interaction between the number of treated sites and the number of control sites, which reduced AIC to 195,502. Extreme outliers are now somewhat reduced, the region bounded by the upper and lower lines in Figure 5 illustrating the range in which the observed and fitted values differ by less than 0.2.

We thus added this interaction to the model, and then considered adding quadratic terms in each of the variables – much the greatest improvement (AIC=176,041) came about as a consequence of adding the quadratic form of α (Figure 6). Finally, with this quadratic term also

17

Figure 4: Estimated values of power under a model with interactions between α and all other predictor variables (y) versus those estimated by repeated simulations (x).

Page 18: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

added we also considered adopting a non-symmetric complementary log-log link function, rather than the logit used to date, but the logit remained a better fit. The number of outliers, however, though much reduced, is not negligible (Figure 6).

The same set of parameters performs much better estimated from small values

of α (α > -0.02; Figure 7), but remains prone to occasional erratic predictions of cases in which power is low when fitted only to data in which α is greater than this (Figure 8). We restrict further consideration therefore to predictions based upon this model with coefficients estimated from (and hence applicable to) the subset of data with α > -0.02, that is, considering reductions of more than 2% per annum in ‘treated’ sites, accumulating to, say, reductions of at least 17% over a ten-year survey. Parameter values are given in Table 5.

18

Figure 6: Estimated values of power under a model with interactions between (i) α and all other predictor variables, (ii) the numbers of sites treated and not treated and (iii) a quadratic term for α (y), versus those estimated by repeated simulations (x).

Figure 7: Expected and observed (simulated) values of power using the model of Figure 6, but fitted only to data where α < -0.02.

Page 19: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Table 5: Estimated coefficients under the final model, for α > -0.02, with various interactions.

Variable(s) CoefficientIntercept -7.65Slope 7.28Nsites * treated -0.0137Nsites*(1-treated) -0.0123Abundance_mean 0.527Abundance_var 0.701Duration 0.226missed -0.464α -262.2Scale -0.0785α2 -12650α*Slope 24.65α*Nsites*treated -0.761α*nsites*(1-treated) -0.125α*abundance_mean -30.85α*abundance_var -43.73α*Duration -3.072α*missed 121.2α*Scale 4.683Nsites2 * treated * (1-notreated) 0.00042

An additional, cross-validation exercise (Figure 9) was also carried out as a more robust test of the model’s predictive power. In this case, the model was fitted to one set of artificial data from 500 sets of simulations and then used to predict the power matched to 200 sets of entirely separate simulations. That is,the predictions are based upon a model fitted beforehand to separate data, so the coefficients are entirely independent of the values used subsequently in the second set. While the match is predictably less close than those in previous figures, due to this removal of the dependency, the model is largely adequate for indetifying scenarios with low or high associated power.

19

Figure 8: Expected and observed (simulated) values of power using the model of Figure 6, but fitted only to data where α > -0.02.

Figure 9: A cross-validation exercise. The model of Figure 6 was fitted to the values of power estimated from 500 sets of simulated data, and used to predict the power from 200 further additional, independent, observations. The independent estimates of power from the latter are plotted against their values as predicted from the model fitted to the former. Model coefficients are given in Table 5.

Page 20: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Using the generic equation to illustrate dependence of power on key variables

The power curves arising can be represented in an almost unlimited variety of forms, given the large number of factors determining their shape. In figure 10a, we assume a scenario in which the slope at a control site is zero (i.e, the population is stable), and both the mean and variance of the initial log-abundances are 2.0, with the ensuing ratio of variance/expected counts (‘overdispersion’) = 5.0. A random 40% of scheduled visits are assumed missed over a survey of ten years duration; the true effect α is taken as -0.01, that is the growth at a treated site is given by exp(α) = 99% compared to that at an untreated site. To produce the curves, various levels are assumed for the proportion of sites treated (ranging here from 0.2 to 0.5, in increments of 0.1, bottom to top) and the total number of sites surveyed, presented on the x-axis. As would be predicted, power increases with the number of sites surveyed, and as the proportion of these treated rises towards equality with the control sites. It increases also with the duration of the survey (Figure 10b, where the survey is extended to one of fifteen years and all else is as Fig 10a), decreases with the proportion of visits missed (Figure 10c, which differs from 10a only in that this proportion is reduced from 40% to 20%). Power appears largely robust to realistic variation in the extent of the decline at the control sites; Figure 10d differs from Figure 10a in that this decline is taken as Rt = -0.05, that is a decline of almost 5% per year.

a: 40% of visits over 10 years missed at random. Value

of slope at ‘control’ sites = 0

b: 40% of visits over 15 years missed at random. Value

of slope at ‘control’ sites = 0

20

Page 21: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

c: 20% of visits over 10 years missed at random. Value

of slope at ‘control’ sites = 0

d: 40% of visits over 10 years missed at random. Value

of slope at ‘control’ sites = -0.05

An alternative visualization is shown in Figure 11, where the number of sites is fixed at 100 throughout and the x-axis represents variation in the initial abundance. All other parameters are fixed to the same values, respectively, as the analogous plots in Figure 10; that is, the proportion of sites treated ranges from 0.2 to 0.5, bottom to top; 9b runs to 15 years duration, 9c has only 20% of visits missed and 9d has a decreasing trend at ‘control’ sites. The predictable increase in power with abundance is clear, and the conclusions drawn from Figure 10 previously re-inforced: power increases with survey duration, decreases with visits missed and is relatively little affected by modest changes in the underlying temporal trend.

21

Figure 10: Estimated power as a function of sample size. Alpha = -0.01. Mean and variance of initial log-abundances = 2 and the overdispersion parameter = 5. Power curves, top to bottom, represent 50%, 40%, 30% and 20% of all sites treated. Other parameter values vary as stated in individual legends.

Page 22: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

a: 40% of visits to 100 sites over 10 years missed at

random. Value of slope at ‘control’ sites = 0

b: 40% of visits to 100 sites over 15 years missed at

random. Value of slope at ‘control’ sites = 0

c: 20% of visits to 100 sites over 10 years missed at

random. Value of slope at ‘control’ sites = 0

d: 40% of visits to 100 sites over 10 years missed at

random. Value of slope at the control sites = -0.05.

Needless to say, power also increases with the scale of the effect sought, α. Figure 12 provides curves derived from parameters identical to those of 11(a,b), but with α = -0.02 rather than -0.01.

22

Figure 11: Estimated power as a function of initial abundance. Variance of initial log-abundances = 2 and the overdispersion parameter = 5. Power curves, top to bottom, represent 50%, 40%, 30% and 20% of all sites treated. Other parameter values vary as stated in the legend.

Page 23: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

a b

Finally, the duration of the survey will also influence the power to detect change. Simulations covered a range of 5 to 20 years; as Figure 13 illustrates, with time series at the lower end of this range, given the other parameter values as stated in the legend, power is greatly reduced, particularly for less abundant species when the number of sites is limited (Fig. 13 a & b).

23

Figure 12: Power curves analogous to those of Figure 10 a & b, but with α reduced to -0.02.

Page 24: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

a b

c d

24

Figure 13: Estimated power as a function of the duration of the study for species of Log (mean abundance) of a) 1; b) 2; c) 3; and d) 4. Other parameter values are R = 0; variance = 1; over dispersion = 5; sites = 50; alpha = -0.01; 40% visits missed at random.

Page 25: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Exploring the utility of existing ESNs using the ‘Generic Equation’. (Obj 4)

We used data from two ESNs which collect counts of species abundance over time to explore where individual species from each scheme lie on the power graphs generated by the generic equation.

UK Butterfly Monitoring SchemeWe used data from the UK butterfly monitoring scheme to estimate the abundances at each site over two visits (the scheme protocol) and the level of overdispersion under the standard model of Freeman and Newson (2008) for two species, the Large White and the Small Tortoiseshell. These we used to replace the arbitrary values in Figure 10a, thus producing approximate power curves matched specifically to data that might be expected to accrue for these species during a similar scheme. Results are shown in Figure 14, and show slightly higher power for the Large White.

In reality, the number of UK BMS sites that cross arable land in which these two species are recorded is ~30, so we would anticipate that the best estimate of the probability of detecting change under these parameters would lie at the lower end.

a b

25

Figure 14: A set of power curves matched to data observed in the UK Butterfly Monitoring Scheme for a) the Large White with Log (mean abundance) = 2.61; Variance = 5.73 and b) Small Tortoiseshell with log (mean abundance) = 3.5; variance = 3.5. Other parameter values are: 40% of visits over 10 years missed at random; alpha = -0.01 (i.e. 1% reduction p.a.); over dispersion = 10; power curves, top to bottom, represent 50%, 40%, 30% and 20% of all sites treated.

Page 26: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Breeding Bird Survey (BBS)Data from the breeding bird survey was used to parameterize the generic equation for three species chosen from Table 2 (Linnet, Reed Bunting and Yellowhammer). As with the previous example, species and ESN specific parameters determine the predicted power curve in each case. However, for practical reasons, simulations covered a range of parameter values up to 200 sites, yet the BBS has up to 800 sites for some species. As extrapolation cannot be justified beyond the simulated range, a conservative value of 200 sites was chosen for each of these three sets of power curves, with the other parameters as specified in table 6. These parameters are derived from real survey data from BBS squares in south-east England (100km grid squares TA, TF, TG, TL, TM, TQ, TR and TV) from 2002-10.

Table 6: Parameter values representing three bird species from the BBS for the generic power equation.

Species Growth at untreated site (R)

Mean initial abundance

Variance of initial site abundance

Proportion of visits missed

Scale parameter

Survey duration(years)

Real Total sites

Sites assumed for simulation

Yellow-hammer

-0.0204 0.6285 0.0022 0.3585 1.357 9 690 200

Linnet -0.0407 0.6482 0.0020 0.3319 4.110 9 704 200

Reed Bunting

0.0083 -0.3806 0.0046 0.3292 1.322 9 377 200

26

Page 27: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

0.2

0.4

0.6

0.8

1

1.2Linnet

alpha=0alpha=-0.004alpha=-0.008alpha=-0.012alpha=-0.016alpha=-0.02

Proportion of sites treated

Pow

er

a

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

0.2

0.4

0.6

0.8

1

1.2 Reed Bunting

alpha=0alpha=-0.004alpha=-0.008alpha=-0.012alpha=-0.016alpha=-0.02

Proportion of sites treated

Pow

er

b

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

0.2

0.4

0.6

0.8

1

1.2Yellowhammer

alpha=0alpha=-0.004alpha=-0.008alpha=-0.012alpha=-0.016alpha=-0.02

Proportion of sites treated

Pow

er

c

Figure 15: A set of power curves matched to data in the breeding bird survey for a) Linnet; b) Reed Bunting; c) Yellowhammer. Other parameter values are as in Table 6.

27

Page 28: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

The predictions shown are derived from real survey data from Breeding Bird Survey squares, but assuming a total sample of 200 squares. In this area, the real totals of survey squares in which these species are found range from 377 to 704, so a total of 200 can be considered to reflect “treatment” over a smaller spatial scale. Power would be expected to increase with larger areas being treated. For all three species, the power to detect differences >1.6% p.a. is greater than 0.8 after a nine year period.

28

Page 29: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Analyses of spatial data within years using Countryside SurveyIt is necessary to adopt a different approach for The Countryside Survey (CS), as this scheme undertakes an intensive sample once every 7 years (approximately). The approach uses CS plot level data located within, or adjacent to, fields of the pre-specified crop of interest. Plots located within fields of maize and potatoes were assessed under the ‘generic uptake and change scenarios’. There were insufficient plots recorded within fields of sugar beet for this ESN, so sugar beet scenarios were not included in the analysis. The analysis focused on the spatial differences observed between plots located within GM uptake and plots located outside GM uptake, rather than changes over time because of the long period between successive surveys. Analysis of the Countryside Survey therefore plays to the strengths the survey has in terms of spatial representation, the number of variables collected at the plot level and the direct association between the variable and the particular crop of interest. The power analysis investigated was therefore purely a spatial one, where the difference between GM to non-GM plots was examined within the same survey – CS2007.

The analysis was performed at two different scales for which data on GM uptake may be available: field level and 1km square level. This allowed us to model the differences we see in power between the two levels of information and the direct influence scale of information has on any analysis or attribution.

Three categories of data were considered: weed abundance (% cover in quadrats of a number of common weed species), soil properties (soil Carbon, soil Nitrogen, and soil pH) and water quality (associated with arable areas).

MethodsUnder the different uptake and change scenarios, a proportion, equal to the uptake scenario, of CS plots containing the crop of interest were randomly selected and the observed response associated with these plots changed by a factor equal to the change scenarios. The resulting data was then modelled and the significance of the indicator term corresponding to GM areas was stored. This was then repeated 1000 times to obtain a percentage of times we observe a significant effect of the GM indicator term.

Weed Abundance

The percentage ground cover within a quadrat covered by each of these species was collected in the CS vegetation plots randomly located within the 1km squares and vegetation plots at arable field margins. The cover data was therefore directly associated with the specific crops of interest – maize and potato (sample size for sugar beet fields was insufficient).

29

Page 30: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Power to detect spatial differences in cover within a survey year between GM uptake plots and non-GM plots was examined using a generalised linear mixed model (GLMM) with a gamma error distribution. Square level random effects were incorporated and the significance of the term representing treatment (GM or conventional) was stored. The proportion of significant results over 1000 simulations provided the statistical power. The statistical model used in the analysis was a log-linear model with Gamma error distribution (as cover data was a positive, continuous, skewed metric) and a random effect accounting for differing levels of variation between CS squares to within CS squares. The model is given by:

ln (P¿¿C , i , j)=μC+α GC ,i , j+V C ,i+εC ,i , j ¿

where PC ,i , jis the percentage cover of the species in question in plot j within square i containing crop C, μC is the mean percentage cover for plots containing crop C, GC ,i , j is an indicator variable taking value of 1 if plot jcontaining crop C in square i is in GM and 0 otherwise, α is the affect that GM has on species richness, V C ,i is a random effect for the ith square for crop C and ε C ,i , j is a random effect for the jth plot in square icontaining crop C. Note that no temporal component is included here as we are not modelling change, merely a difference between two treatments – GM and non-GM.

The results of the power analysis to detect the effect of a difference in common weed species abundance between GM and non-GM plots are shown in Figure 16.

a

30

Page 31: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

b

Soil Properties

CS takes a 15cm core from the topsoil located within the randomly distributed vegetation plots. Multiple soil measures including chemistry and biology are obtained from this core. Although CS soil measurements are made at the within-field plot level, only single cores are taken and hence there is a much smaller sample size and typically much greater spatial variability than for the botanical data. This data was used to assess the power to detect changes in soil Carbon, Nitrogen and pH between the GM and non-GM areas under the uptake and change scenarios proposed.

Similar to the species cover analysis, a generalised linear mixed model (GLMM) was used with a gamma error distribution and a log link function. Square level random effects were incorporated and the significance of the term representing within GM or not was stored. The proportion of significant results over 1000 simulations provided the statistical power. The model is given by:

ln (S¿¿C , i , j)=μC+α GC ,i , j+V C ,i+εC ,i , j ¿

where SC , i, jis the particular soil property in question (Carbon, Nitrogen or pH) in plot j within square i containing crop C, and additional terms are as defined for the species cover analysis. Results of the proportion of tests where a significant affect of GM uptake was detected for each of the soil properties are shown in Figure 17.

31

Figure 16: Power to detect changes in common weed species cover in (a) Maize plots and (b) Potato plots. Power is shown as a function of GM uptake and Effect size with 5%, 10%, 25% and 50% effects represented by open circles, filled circles, open squares and filled squares respectively.

Page 32: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

a

b

Square Level Spatial Analysis

The previous two analyses have used CS data at the plot level and the corresponding indicator of GM uptake for each particular plot. Hence we have assumed that GM uptake information would be available at the field level. However, as field level information on GM cultivation may not be available, we decided to repeat the spatial analysis of common weed species abundance and soil properties but at a square rather than plot level under the assumption that information on GM uptake was known only at 1 km square resolution as opposed to field level resolution. The statistical model is largely unchanged, except the modelled data now corresponds to mean species cover and mean soil carbon/nitrogen /pH in square kcontaining crop C, represented by M C ,k Specifically the model is given by,

ln (M ¿¿C , k )=μC+α GC ,k+ε C ,k ¿

32

Figure 17: Power to detect changes in soil chemistry in (A) Maize plots and (B) Potato plots. Power is shown as a function of GM uptake and Effect size with 5%, 10%, 25% and 50% effects represented by open circles, filled circles, open squares and filled squares

Page 33: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

and as we are modelling square level means, no square level random effect is needed and we use a gamma error distribution as before. Note that GC , k represents the information we have on GM uptake, which is either an indicator variable taking the value 1 if square k contains any GM occurrence or it represents the number of plots within square k with GM uptake. In the simulations all change scenarios are implemented at the plot level before then being aggregated to square level for analysis.

As this analysis uses the same plots as the results in the previous analyses, the two can be directly compared to see the effect on power the resolution of available GM information has. Plots in Figures 18-20 show the affect that this has had on our power to detect change.

33

0 10 20 30 40 50

020

4060

8010

0

Soil Carbon

% Change

% P

ower

20% uptake40% uptake60% uptake80% uptake

0 10 20 30 40 50

020

4060

8010

0

Soil Nitrogen

% Change

% P

ower

20% uptake40% uptake60% uptake80% uptake

0 10 20 30 40 50

020

4060

8010

0

Soil pH

% Change

% P

ower

20% uptake40% uptake60% uptake80% uptake

Figure 18: Power to detect change in soil carbon against effect size for different uptake scenarios when GM information is available at 1km square (dashed lines) and field (solid lines)

Page 34: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Water Quality

As the CS water samples are not collected specific to crop fields, like the botanical plot level data and soil data are, we will investigate what power the current CS sampling scheme has to detect changes generally in water quality under different effect scenarios deemed suitable for that dataset. In CS freshwater measures include the sampling of a single headwater stream site per 1km survey square at which a biological sample of the stream macroinvertebrates is taken (measured as ASPT – see earlier). Although there is no direct link between the field and the freshwater data we have, we still have detailed habitat mapping of each 1km square and therefore we limited our analyses only to freshwater samples taken from arable dominated squares. There are currently 34 such squares sampled in CS for which we have freshwater data available.

A generalised linear model (GLM) was used with a gamma error distribution. No square level random effects were needed here because the indicator used was a square level metric. The significance of the term representing within GM or not was stored. The proportion of significant results over 1000 simulations provided the statistical power. Specifically the model is given by,

ln (W ¿¿k )=μ+α Gk+ε k¿

where W krepresents the ASPT for square k, Gk is an indicator variable taking value of 1 if square k is in GM and 0 otherwise, α is the affect that GM has on water quality (ASPT) and ε k is the random error for the k th square. Resulting power estimates for the different uptake and change scenarios are shown in table 7.

Table 7: Power to detect changes in water quality as measured by Countryside Survey. Power is shown as a percent under different change effects and uptake scenarios.

Change %Uptake % 5 10 25 50

20 11 43 99 9940 17 71 99 9960 27 65 99 9980 18 51 99 99

Increasing the sample

For all analyses, the effect on power of increased sample size was investigated by repeating the analysis with larger pseudo data sets. The new data sets were derived from the raw data by repeatedly resampling plots with replacement until a simple size of the necessary number of plots was achieved. Each pseudo sample set was analysed according to the same prescription as set out above for the species abundance, soil properties and water quality data to obtain an estimate of power. For each sample size the average power over 1000 pseudo sample sets was

34

Page 35: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

taken. This provides an indication of how larger sample sizes can increase the power to detect effects and also may provide an optimum efficiency sample size. Figures 21-23 clearly show the increase in power obtained by increasing the sample size.

35

Page 36: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Figure 21: Power to detect changes in cover of Cirsium arvense in potato plots under different change and uptake scenarios against sample size. Current CS sample size is at the lower end of the scale.

36

Page 37: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Figure 22: Power to detect changes in cover of Galium aparine in maize plots under different change and uptake scenarios against sample size. Current CS sample size is at the lower end of the scale.

37

Page 38: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Figure 23: Power to detect changes in water quality under different change and uptake scenarios against sample size. Current CS sample size (34 squares with arable dominated catchments) is at the lower end of the scale

Unified ModelWithin the CS analysis we have investigated how sample size, uptake, effect size and level of GM information all affect our power to detect and attribute change resulting from GM uptake. However, we have shown this individually for different metrics and ideally we want to be able to conclude what overall effect on power these features have. For example, we may want to know what the average affect is of having GM information available only at square level rather than field level. We therefore pooled together all the power analyses conducted above into a unified model in order to see the average changes induced by certain factors and what overall affect they have on power. We brought all the data from the analyses into a single unified logistic model with gamma error distribution defined by

ln ( Pi )=μ+α 1 Ei+α2 N i+α3 U i+α 4 S i+α5 V i+εC , k

38

Page 39: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Where Piis the power estimate, Eiis the effect size (either 5%, 10%, 25% or 50%), N i is the sample size, U iis the uptake (either 20%, 40%, 60% or 80%), Si is the level of GM information (either square level or field level), V iis the variation of the metric in question and ε iis the associated error.

This is possible as all analyses conducted have used the same gamma based error distribution. Variation in metric is slightly complicated by the inclusion of the square level random effect. However, it is the residual top level error that will be used in the model above rather than including any square level variation. This unified model base approach allows us to estimate the expected power we have to detect spatial differences between GM and non-GM areas given the required set of input data. Table 8 shows the average affect each term has on the power to detect changes and we can immediately see that on average only knowing GM uptake information at square level as opposed to plot level decreases the power by just over 6%. We can also see that on average each percentage change increases the power by just under 2% - with 50% changes we are often at 100% power. The interaction between sample size (N) and change effect shows that under 10% change we can reasonably expect an extra 100 plots to increase the power by just over 5%.

Table 8: Significant coefficients from fitting a unified model to all power analyses conducted in order to understand the average affect that each of the influential factors has on power.

UptakeChange Effect

Square Level

N:Change 5%

N:Change 10%

N:Change 25%

N:Change 50%

0.01661 1.9266 -6.011845 0.007037 0.055757 0.080359 0.015341

39

Page 40: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Impacts of Environmental Stewardship Measures as a proxy for GM crop impacts. (Obj 4)

We explored the power of the BTO/RSPB/JNCC Breeding Bird Survey (BBS), a standardised national volunteer bird survey, data to detect changes in populations of farmland birds with a percentage area change in land-use (e.g. 40% conversion of maize crop to GM maize), using change from standard to Environmental Stewardship (ES) stubble management as a proxy for GM crop uptake. This can be justified on the basis that most of the ES stubble area reflects a change in crop management (from herbicide-sprayed to unsprayed), rather than a change in the cropping regime and, therefore, it is conceptually not dissimilar to a change from conventional to GM crops, but retaining the same crop type. Furthermore, the known effects of ES stubble considered here, although statistically significant, are not large and, thus, this power analysis will provide an assessment of the potential to detect relatively small changes in species’ abundance.

A major caveat, however, is that there is no evidence as to how the magnitude of the biological effect of a switch to a GM crop compares to that of a switch from standard to ES management of stubble for any given species or for a ‘generic bird’. In this analysis, we assume a similar magnitude of effect on bird population growth rates for a GM crop as for ES stubble. However, “GM crop” areas are simulated (by resampling BBS data) to match the regional distributions of maize, beet and potatoes in order to approximate realistic bird data sets for the geographical distribution of each crop. Note that this assumes that the uptake of GM varieties of each crop follows the current distribution of cropping.

Using bird species for which we have previously demonstrated statistically significant relationships between ES stubble and population growth rate (linnet, reed bunting and yellowhammer), we investigated the power to detect these relationships given the spatial distributions expected for GM crops and the six-year time period that has elapsed since the inception of ES in 2005. We did this analysis using data from the whole of England together and for BBS squares in arable dominated farmland.

METHODS

Breeding Bird Survey (BBS)

BBS (1994-present) covers c. 2000 randomly selected lowland farmland 1km squares throughout England annually. Volunteers walk two nominally parallel 1km transects (500m apart) through each square twice during the breeding season. Each transect is divided into five 200m sections; species-specific bird counts and habitat are recorded separately in each. Annual, square-specific counts are calculated as the maximum over the two visits of the total

40

Page 41: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

count summed across transect sections (Risely et al. 2011). For this study, BBS squares were selected if they were in lowland farmland (CEH Land Cover Map 2000 Environmental Zones) and had been surveyed in ≥2 years between 2002 and 2010. Squares comprising <50% farmed land were omitted as non-agricultural. The major landscape type for each square was categorised as arable (ratio of arable:pastoral areas ≥2), pastoral (pastoral:arable ≥2) or mixed (all other squares), based on the CEH Land Cover Map 2000. The analysis was conducted with all squares and arable squares only.

The analysis was restricted to species that rely on agricultural land for some part of their life-cycle (i.e. breed or winter on farmland) (e.g. Vickery et al. 2009) and to those that were previously found to respond significantly to ES stubble management (Baker et al. unpublished report to Natural England). Consequently, this analysis includes linnet, reed bunting and yellowhammer. For farmland specialists (linnet and yellowhammer), data from all transect sections were included in the analysis (not just ‘Farmland’) because birds recorded in the non-farmland parts of a survey square are, nevertheless, likely to have been influenced by the farmland nearby. For non-specialists (reed bunting) that regularly exploit non-agricultural habitats (e.g. gardens, wetlands), only counts from transect sections that were recorded as farmland were used for each square.

Environmental Stewardship data

Spatially referenced data containing the ES and Countryside Stewardship Scheme agreement details for each holding were supplied by Natural England (NE) and were used to assess the amount of stubble options per BBS square per year using the methods of Davey et al. (2010).

Data sampling

In order to generate a BBS data set in which the total area of ES stubble options were representative of potential GM cropping scenarios, samples were drawn randomly, with replacement, from the set of existing BBS squares for each region until a required area of stubble was reached that reflected a predicted regional area coverage of a given GM crop, while also maintaining the regional sample sizes found in the source data set. This was done by separately sampling squares that included ES stubble and those that did not. The regional random samples were then combined together for analysis.

Thus, in detail, the area of potato, maize and sugar beet cropping for each region in 2010 was obtained (http://archive.defra.gov.uk/evidence/statistics/foodfarm/landuselivestock/junesurvey/results.htm) and the area for scenarios between 20 and 80% were calculated (i.e. area if 20% of existing crop was converted to GM). The amount of each crop under each scenario that would be expected to fall within the randomly distributed BBS squares was calculated, using the number of squares that were surveyed at least twice between 2002 and 2010. The data for each region were divided into Stubble > 0 (Stubble) and Stubble = 0 (NoStubble) and random samples of BBS squares were drawn from the ‘Stubble’ data with replacement until the total area of ES stubble option approximately equalled the total area expected to occur within all BBS

41

Page 42: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

squares (±5%) for that region. Random samples were then drawn, with replacement, from the ‘NoStubble’ data and added to the selected ‘Stubble’ data set until the combined sample size was equal to the actual number of BBS squares in the region. This needs to be done as a separate step to ensure that the sample size remains the same as in the original BBS dataset. This was repeated for all regions and scenarios, with 100 samples drawn for each region/scenario combination. The data were combined into a national data set for analysis, where total number of BSS squares was equal the total number from the original data set, but the Stubble Area was different, reflecting the area expected given a particular GM cropping scenario.

For several of the scenarios the samples reached the total number of squares in the region before the required area of cropping was achieved. Where this occurred for only a few replicate samples these samples were deleted and new samples were randomly generated until 100 samples with the required area and number of squares was reached. However, for some scenarios/crops (e.g. sugar beet at > 55% in the East of England) these cropping areas were not possible to sample given the number of squares available and so these scenarios were omitted from the analysis (sugar beet > 55%).

Statistical analysis

We used a log-linear approach that models the change in expected abundance between consecutive years and can incorporate effects of spatio-temporal covariates, e.g. ES option quantities, on local growth rate. This approach allows maximum use of the available data by including observations from squares not surveyed, or where counts were zero, in the previous year. Fundamentally, the analyses estimated the additional effect of ES on each species’ population growth rate but, importantly, growth is not thereby forced to be greatest in the years of highest management levels. The model is a multivariate extension of Freeman & Newson (2008):

ln (μ i , t+1)=Rt+αPi , t+ βQi , t+ ln (μi ,t )(1)

where μi,t is the expected species count at site i at time t, Pi,t is the amount of a given ES management variable in square i at time t and Qi,t is the percentage of arable habitat per square. Qi,t was mean-centred prior to fitting, and was included because most ES options are targeted at either arable or pastoral farmland (e.g. stubble or grassland management), so option uptake is likely to be correlated with the balance of arable and pastoral farming in the landscape, which could influence bird population trends (e.g. Robinson, Wilson & Crick 2001). From (1), Rt is the ‘background’ population growth rate from t to t+1 at a hypothetical reference site where Qi,t has the mean value and there is no management. The parameter α introduces the effect of ES management on population growth at a site, and β controls for the effect of the surrounding landscape. For fitting, we rewrite (1) as:

42

Page 43: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

ln (μ i , t+1)=∑j=1

t

R j+α∑j=1

t

Pi , j+β∑j=1

t

Qi , j+ln (μi ,1)+ ln (Gi )

(2)

which is a standard generalized linear model, with offset ln(Gi), where Gi is the number of transects surveyed in square i, introduced to standardise the square-specific intercepts μi,1, as some squares had fewer than ten 200m sections. Models were fitted assuming a Poisson distribution for the observed BBS counts using the GENMOD procedure in SAS 9.2 (SAS Institute Inc. 2008), accounting for overdispersion using Pearson’s χ2 goodness-of-fit statistic. The significance of ES effects on population growth rates was assessed using similarly adjusted likelihood-ratio test statistics of the hypothesis that α = 0.

The data sets were analysed for all squares and also for arable only squares (ratio of arable:pastoral > 2:1), because there is likely to be a stronger relationship between the distribution of crops and bird populations in this landscape types, thus potentially increasing the power to detect effects of changes in crop management.

Figures were plotted using R.2.15.0 and the smoothed curves fitted using Friedman’s super smoother function (supsmu).

RESULTSThe power to detect statistically significant changes in population growth rates for linnet, reed bunting and yellowhammer was affected by both the crop type (representing changes in regional distribution) and scenario. For maize conversion scenarios (Figure 24) both linnet and yellowhammer show >50% chance of detecting significant effects from changes in crop management across all squares nationally. Yellowhammer shows a particularly high probability of detecting these effects with conversion scenarios >50%. The power to detect significant effects from changes in crop management is reduced when just arable squares are analysed, but still indicates a ca. 80% chance of detecting effects with >50% conversion scenarios. Reed bunting shows very low power to detect effects from changes in crop management using all squares and arable only.

43

Page 44: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

With potato cropping patterns (Figure 25), linnet showed the highest power to detect effects from changes in crop management, for all squares and arable squares only. When all squares were included in the analysis, this power was >80% for all conversion scenarios. For yellowhammer this power was lower, reaching a maximum of ca. 80% chance with conversion scenarios approximately >60%. Here the arable only analysis followed a similar trend, showing only slightly lower power. Again reed bunting had low power to detect effects of change in crop management, although when analysing all squares this power increased rapidly with cropping scenario, reaching ca. 80% chance of detecting an effect at the maximum conversion scenario tested here (80%).

44

Figure 24: Power to detect changes in population growth rate for three bird species through BBS monitoring in maize.

Page 45: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

45

: Power to detect change in population growth rates for three bird species through BBS monitoring in

Page 46: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Because of the large area of sugar beet crop in the east of England we were unable to simulate cropping patterns for sugar beet above 55% conversion scenarios within the number of existing BBS squares. Thus, for sugar beet the simulations cover 20 to 55% scenarios (Figure 26). For the analysis using all BBS squares the power to detect effects of crop management on bird population trends is greatest for linnet, approaching 100% with >30% conversion of existing sugar beet to GM. The yellowhammer results are similar, although the slope is steeper, approaching 100% with >40% conversion scenarios. The results for reed bunting again shows low power to detect changes in crop management, and for all three species the analysis with arable only squares gives lower power.

Overall, the results suggest that linnet and yellowhammer would be good species for monitoring the effects of GM conversion (wide spread, relatively abundant and strongly associated with arable landscapes), together giving high statistical power across the three crops considered here. The low power shown by reed bunting is likely to be a consequence of its use of wider habitat and shows the importance of the species used for monitoring.

Implications of increasing power through increasing sample size.

Our models and simulations have illustrated how power will be influenced by changes in sample size, duration of study, and a range of other species specific and ESN specific parameters. At some point in the design of a General Surveillance strategy, the decision needs to be made as to how powerful GS strategies should be designed to be, and whether existing ESNs will provide sufficient data. The two simplest ways in which to increase power of a surveillance strategy are through the most appropriate statistical analysis, or through increasing the sample size. This report has not compared different methods of analysis, but

46

Figure 26: Power to detect change in population growth rates for three bird species through BBS monitoring in sugar beet.

Page 47: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

chose a design (paired comparison within years) that is one of the more powerful methods. Here we consider potential costs for the second option, increasing the sample size of three ESNs: a) the Wider Countryside Butterfly Survey (WCBS); b) Countryside Survey; c) the Breeding Bird Survey. We do not consider the UK BMS, as the choice of site is led by the volunteer, and few volunteer to monitor arable sites due to their relatively low butterfly biodiversity. However, the newly established WCBS will focus on arable land, and offers greater potential for GS in future years.

Costs for increased butterfly sampling under WCBSThere are two possible approaches for increasing sample sizes within the Wider Countryside Butterfly Survey (WCBS): professional and volunteer. The benefit of the former is that funders have complete control over sample location, effort, consistency and observer quality. This approach is likely to be adopted by the Welsh Government to monitor agri-environment schemes and has been used to increase sampling in under-recorded areas in Scotland. In the long term, however, using volunteers is more sustainable and cost-effective. There are some important caveats: (i) the WCBS is run by Butterfly Conservation, the Centre for Ecology & Hydrology, BTO, JNCC and other funders, so all stakeholders would have to agree to any extension work; (ii) volunteers are a finite resource, are in demand for other surveys and are difficult to recruit in remote areas, so uptake is unpredictable; (iii) all that can be costed is the recruitment effort, not the cost per unit survey effort, because surveys are voluntary and it is unknown how much of the maximum available volunteer pool current survey effort uses (so there are also likely to be diminishing returns for larger samples as the upper limit to the number of volunteers available is approached).

Given the professional and volunteer possibilities, guideline costs are given below (at current rates, which will be subject to inflation). Please note that these are subject to formal approval and should not be taken as definitive or fixed.

Professional - £25K per 100 extra squares annually (this is based on resource requirements for supplementary WCBS monitoring and support in the context of monitoring agri-environment schemes in Wales).

Volunteers – initially c. £30K per 100 extra squares (very approximate one-off recruitment cost; there would be some ongoing effort required to keep sample size up, probably c. £10K per annum per 100 extra initial squares).

Note that these costs are lower than those estimated for increased sampling under BBS (below). The WCBS is a newly established scheme and is judged to have good scope for increased sample size by 100 extra squares.

Increasing the Countryside Survey sampleCountryside survey is a stratified random sample of approximately 600 1km squares over Great Britain. Within each 1km square data on various biophysical measurements are taken. For some measurements however, such as species richness and soil properties, the 1km scale is not appropriate and therefore smaller plots nested within these 1km squares are used for specific vegetation and soil core sampling. In addition to a randomly located set of

47

Page 48: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

5 plots per square the nested plots also cover different habitat features of the square. Examples include arable field margins, river banks and roadside plots.

To increase the sample size of CS measurements there are therefore two possibilities to consider. The first is to increase the number of squares surveyed across the landscape and the second option is to increase the number of plots nested within the squares. The advantage of increasing the number of squares is that the spatial coverage over GB is increased and also any measures that we currently collect at the 1km scale, such as water quality, will also have an increased sample size. The advantage of increasing the number of plots within squares is that it will be considerably cheaper than any square level additions and can be targeted at particular habitat types relevant to the needs of funders. For example, we could specifically increase the number of arable field margin plots or within arable field plots without affecting the overall statistical robustness, whereas we could not target whole 1km squares to particular areas or habitats.

CS is a survey conducted entirely by professionals with extensive QA and QC procedures in place to ensure optimum quality and efficiency of the data. Each square is surveyed by a team of up to four people in a short a time as possible at the same time of year as for all previous surveys. The greatest cost to CS is paying the surveyors, training them and actually getting them to the squares. With previous experience of CS, the average cost per square across the UK is approximately £7k. This includes surveyors’ time, overheads, training, equipment and T&S. Prep phase activities, lab costs, project management or the cost of providing vehicles is not included in this cost.

If we add all elements of the survey together including surveyors’ time, overheads, training, equipment, T&S, data preparation, management, analysis and reporting then the cost per square is doubled to approximately £14k.

Adding additional plots is a far cheaper option because there is no additional cost of placing the surveyors at the square. For measures such as soil properties though there are still the associated lab costs. The cost per plot therefore depends on whether soil cores are to be taken. A rough approximation of £500 per plot is considered suitable to incorporate these factors. Translating these costs per plot into a sample size increase that affects the power to detect change, we can the relationship between additional financial resources and increases in power. This relationship is shown below in Figure 27 which shows the power to detect a 10% change in species cover assuming a 40% uptake in GM crop cultivation. It is important to note that any changes to the CS sampling design needs to ensure that it is agreed with all of the co-funders and most importantly that it remains consistent with the previous 34 year record of the survey. Increases in sample size obviously have time management implications on the surveyors and this must be taken into account. Over the 34 years of CS there has been an increased sample size from each survey year to the next and we therefore have extensive experience of analysing the unequal sample sizes at each time point, whilst at the same time maximising the use of the data.

48

Page 49: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

Costs for increased bird sampling under BBSThere are two possible approaches for generating additional bird monitoring sample sizes: professional and volunteer. The benefit of the former is that BTO/the funders have complete control over sample location, effort, consistency and observer quality. The BTO has used this approach before to supplement the Breeding Bird Survey, under funding from Defra/Natural England. In the long term, using volunteers is, obviously, cheaper. However, there are some important caveats: (i) BBS is run by BTO, JNCC and RSPB, so all stakeholders would have to agree to any extension work; (ii) volunteers are a finite resource and are in demand for other surveys as well as BBS, so in any given year BTO and stakeholders for other, potentially competing surveys would need to prioritize recruitment and retention for a BBS extension relative to the other surveys, whose future relative priority is unpredictable; (iii) all that can be costed is the recruitment effort, not the cost per unit survey effort, because surveys are voluntary and it is unknown how much of the maximum available volunteer pool current survey effort uses (so there are also likely to be diminishing returns for larger samples as the upper limit to the number of volunteers available is approached).

Given the professional and volunteer possibilities, guideline costs are below (at current rates, which will be subject to inflation). Please note that these are subject to formal approval and should not be taken as definitive or fixed; in addition, consideration of the volunteer option would need to take place within the broader context of volunteer surveys across the BTO and its partners, so formal approval cannot be guaranteed at this stage.

Professional - £37K per 100 extra squares annually (this is based on resource requirements for supplementary BBS monitoring and support in the context of monitoring Entry Level Stewardship for Natural England).

Volunteers – initially c. £33K per 100 extra squares (very approximate one-off recruitment cost based on a previous application in a particular context; NB retention rate is c. 85%, so

49

Figure 27: Power to detect a 10% change in common weed species cover assuming 40% uptake in GM crop cultivation against additional costs to the survey. This additional cost is assumed to translate directly into an increased number of plots.

Page 50: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

there would be some ongoing effort required to keep sample size up, probably c. £10K per annum per 100 extra initial squares).

50

Page 51: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

ConclusionsThis study has illustrated the potential and some of the complexities behind predicting the power of a General Surveillance strategy using our existing Environmental Surveillance Networks.

The Generic Equation for Count DataConsidering ‘count’ data collected in annual census surveys, it is possible to derive a ‘generic equation’ which can predict power (probability of detecting change) with reasonable accuracy given nine explanatory variables (number of sites, duration of study, number of survey visits missed, mean and variance in abundance of the indicator species, background rate of growth at a ‘control’ site, the proportion of ‘treated’ sites, the degree of overdispersion in the data and the magnitude of the effect between control and treated sites). However, it was necessary to include as many as 10 interaction terms (9 of which involved alpha, the magnitude of the effect), and only to consider values of alpha >-0.02, for the predictive model to perform adequately. It should also be emphasized that this predictive model should only be used within the range of parameter values used for the simulations upon which it is based.

Having derived the generic equation, it can be used to estimate how power will be influenced by the nine explanatory variables. The trends illustrated here are as expected:

Factors that increase power include a) longer time series (Fig 10a v Fig 10b); b) fewer visits missed (Fig 10a v Fig 10c); c) higher mean abundance of indicator species (Fig 11); d) greater value of alpha (i.e. searching for larger differences between GM and non-GM sites) (Fig 12a v Fig 10a); e) greater number of sites (Fig 10).

Increasing the background decline in the indicator species at control sites (Fig 10a v Fig 10d) only has a very modest impact on power.

In practical terms, and for the purposes of general surveillance, power is best influenced by choice of indicator species: this will influence both the mean and variance in abundance across sites, and also the number of sites at which that species is detected. Provided that the parameters of the network and the species fall within the range simulated in this study, the generic equation may be used to ensure that power is within an acceptable range. Power curves are drawn for two butterfly and three bird species, to illustrate how power varies with effect size, proportion of sites treated and with number of sites monitored.

Analysis of spatial data within yearsThe Countryside Survey provides a contrasting case study, in which intensive monitoring occurs within one year (but this is not repeated for 7-8 years). A range of metrics for the delivery of ecosystem services are collected in a number of 1km squares which include arable land. Key messages from this case study include:

The power to detect large (>10%) change in many of these metrics is limited (Fig 16 & 17)

51

Page 52: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

If the information on the location of GM crops is only available at 1km square (as opposed to the field) level, this reduces the power to detect change (Fig 18-20)

A second generic equation was derived based on a model with gamma errors which allowed an estimate of the average effect that each influential factor has on power.

Using Environmental Stewardship Measures as a proxy for GM cropsA third, rather different, approach used an existing BBS dataset in which monitoring was used to evaluate the effectiveness of environmental stewardship targeted at birds which utilize stubbles over winter for feeding. If the premise is accepted that the impact of over wintering stubbles is a suitable proxy for a potential impact of change in management associated with a GM crop, then this study illustrates how the probability of detecting those impacts is likely to vary between crops and species. Using this ‘real life’ example also provides further evidence that a change in management can be detected by the monitoring of appropriate indicator species, in spite of the noise and bad behavior inevitably associated with field data collected by an ESN.

One crucial point about this approach is that the management change which is monitored in this example (provision of stubbles over winter) is designed to impact on the three target bird species, whereas in the case of GS of GM crops, the surveillance is for unintended effects – so there is no clear pathway that links change to the indicator species. Consequently, the choice of indicator species should cover a range of ways in which the agro-ecosystem is utilized. The sensitivity of any indicator species will depend upon its ecology, the timing of monitoring with respect to the time of year when the species utilizes the agricultural landscape, and how this is influenced by the change that occurs in the agro-ecosystem.

Limitations of this studyIn many cases the transects from which ESNs conduct their monitoring are constrained to follow field edges within arable land. This introduces an obvious bias towards edge habitat, and counts will be more heavily influenced by how arable management impinges upon field edges rather than crop centres. These biases will be greater for less mobile taxa (for example butterflies will be more biased than birds). Population counts will also undoubtedly be influenced by landscape context: for example, in a resource rich landscape, management changes within an arable field would be expected to have much less influence than in a resource poor landscape. A challenge for the analysis of data from monitoring networks, seeking to detect the cause of changes in biodiversity, is to include those metrics which capture the influence of landscape context on farmland biodiversity, along with other covariates. This would be an important means of increasing the power to detect causes of change in any analysis of ‘real data’. In our simulation study, these influences are all captured in the variance of the indicator metric.

Other biases may also arise upon the introduction of GM crops. One example could be that early adopters are a biased sample of all potential farmers: for example those with pernicious weed problems may be more likely to be amongst the first to adopt herbicide tolerant crops. Our simplifying assumption has been that early adopters are a sample of all

52

Page 53: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

farmers currently growing the crop concerned, randomly chosen with respect to farming practice, baseline biodiversity and all other metrics.

This study has focused on a limited range of metrics. In particular, the examples drawn from the BMS and the BBS have involved counts of single species. Looking at covariation in a range of species simultaneously could represent a more sensitive means of detecting change, and would be a fruitful line of future investigation.

Limitations of General SurveillanceThis study also highlights the limitations of general surveillance. This is most clearly illustrated by two points.

The power to detect small changes is low

Figure 15 illustrates how power changes with the magnitude of alpha for three farmland bird species: from -0.004 (which equates to a 0.4% change p.a., or 3.15% change over a nine year period) to -0.02 (a 2% change p.a. amounting to a 14.8% change over a nine year period). The power to detect small changes, even over fairly long time series of nine years, is low. This message is also illustrated by the CS case study examining the power to detect differences between two treatments within years, with power being relatively low for effect sizes 10% (Figures 21, 22 & 23).

The power to detect change within a few years is likely to be low

Figure 13 illustrates how the duration of the study influences power, with a relatively short study of five years providing low power to detect change even in abundant species if the number of sites is limited (50 in this example).

In conclusion, to detect small effects, on rare species, at an early stage of GM uptake (or any one of those), existing ESNs would need to be supplemented with extra sites, and the costs of this have been estimated for our case studies.

53

Page 54: Executive Summary - Defra, UK - Science Searchrandd.defra.gov.uk/Document.aspx?Document=12365_CB0304... · Web viewIf the fit of a Poisson model is poor, power calculations based

ReferencesBaker, D.J., Freeman, S.N., Grice, P.V. and Siriwardena, G.M. (2012) Landscape scale responses of birds to agri-environment management: a test of the English Environmental Stewardship Scheme. Journal of Applied Ecology, 49, 4, 871-882.

Davey, C., Vickery, J., Boatman, N., Chamberlain, D., Parry, H. And Siriwardena, G. (2010). Regional variation in the efficacy of entry level stewardship in England. Agriculture, Ecosystems and Environment, 139, 1-2, 121-128.

Freeman, S.N. and Newson, S.E. (2008) On a log-linear approach to detecting ecological interactions in monitored populations. Ibis, 150, 2, 250-258.

Risely, K., Massimino, D., Johnston, A., Newson, S.E., Eaton, M.A., Musgrove, A.J., Noble, D.G., Procter, D. & Baillie, S.R. (2012). The Breeding Bird Survey 2011. BTO Research Report 624. British Trust for Ornithology, Thetford.

Robinson, R.A., Wilson, J.D. and Crick, H.Q.P. (2001) The importance of arable habitat for farmland birds in grassland landscapes. Journal of Applied Ecology, 38, 5, 1059-1069.

SAS Institute Inc. (2008) SAS OnlineDoc, Version 9.2. SAS Institute Inc., Cary, NC.

Vickery, J.A., Feber, R.E. & Fuller, R.J. (2009). Arable field margins managed for biodiversity conservation: a review of food resource provision for farmland birds. Agriculture, Ecosystems and Environment, 133, 1–3.

54