Sampling and Monitoring of Environmental Data Md. Abdus Salam Professor Department of Statistics Jahangirnagar University Savar, Dhaka

Sampling and Monitoring of Environmental Data

Md. Abdus SalamProfessor

Department of StatisticsJahangirnagar University

Savar, Dhaka

Components of the environment

• Water, air, soil, biota

1. Water• Industrial wastage• Marine pollution• Urban runoff• Water crisis• Waste water

2. Air

• Climate change

• Global warming

• Sea level rise

• Greenhouse gas

• Indoor air quality

• Volatile organic compound

• Particulate matter

Components of the environment

3. Soil– Soil conservation– Soil erosion– Soil contamination– Urban sprawl– Habitat destruction

4. Biota – Conservation– Species extinction– Endangered species– Poaching

Sampling

– Sampling consists of selection, acquisition, and quantification of a part of the population

– selection and acquisition apply to physical sampling units of the population,

– quantification pertains only to the variable of interest, which is a particular characteristic of the sampling units.

– A sampling procedure is expected to provide a sample that is representative with respect to some specified criteria.

Characteristics of Environmental Sampling– Selection and acquisition of sampling units is cheap– Characteristics of Environmental variables are the precise

chemical or biological characteristics of materials– Quantification of chemical and biological characteristics of

materials are highly expensive and time consuming.

Inaccessible and Sensitive DataComposite Sampling

Full retesting

• We will need either one test (if negative) or n + 1 tests (if positive)

• When p is small, this can be a highly economical approach• We need, on an average, (n + 1) – n(1 – p )n tests. • If p = 0.0005 and n = 20, just 1.2 tests are required on

average.

End

Test all items separately

Test all n IF Negative IF Positive

Composite sampling

• Group retesting

Test all n IF negativeIF positive

End

Test group n1 IF negative

IF positive


IF positive


IF positive

Test all n1

Test all n2

Test all n3

End End End

Composite Sampling

Cascading

Test all n IF NegativeIF Positive

End

Test n/2 IF NegativeIF Positive

Test n/2 IF NegativeIF Positive

End

Test n/4 IF Negative

IF Positive Test n/4 IF Negative

IF Positive

Test n/4 IF Negative

IF Positive Test n/4 IF Negative

IF Positive

End

Test in groups of n/8Etc. Etc. Etc.

Composite sampling for continuous Variables

• X is a continuous variable• X may measure pollution levels in a river

• We want to know if any observed xi in a sample of size n are illegally high vales above standard, xH.

• Thus characteristic A is now defined by ,say

• If we measure X for a composite sample, its value will be the sum of the constituents x i

• Suppose the value of X for the composite sample is x and put• Which is the equivalent of the sample mean of the distinct values making up the

composite sample.

• If any xi >xH, then for the whole sample we would be bound to have

• This reflects the fact that even for the minimal case of violation where all but one of the xi is zero and just one xi > xH and only one is just a little larger, we would still have

• The condition has been proposed as a basis for declaring that the composite sample indicate violation:

• It is known as the “rule of n”.• On this “rule of n” composite sample proceeds as follows:

• If , we declare all observations to be satisfactory.

• If , we would need to retest all sample members.

,/ nxx

nxx H /

nxx H /

nxx /

nxx H /

nxx H /

Hi xx

Hi xx

RANKED-SET SAMPLING

• Although one of the major activities in environmental statistics is that of obtaining relevant data for statistical investigation,

• We have to face the problem that circumstances for obtaining data by means of census, classical sample surveys or designed experiments.

• Quite often we have to take what limited data are in hand (‘encountered data’)

• which may be difficult to analyze using formal methods.

• If we are to collect even limited data for our purposes • We may need to abandon such hallowed principles as strict

randomization, not only in view of access constraints but also to contain costs and to improve efficiency.

• In many areas of environmental risk such as radiation or pollution

• We commonly find that the taking of measurements can involve substantial scientific processing of materials and correspondingly high attendant costs

• We need to look for highly efficient procedures

• One way of doing this is to use what is known as ranked-set sampling.– Example when we wish to estimate as basic quantity as a

population mean.– Suppose we are interested in the mean pollution level of the bathing water

around the inland lake used for recreational purposes.– We might decide to take a random sample of modest size at regular

intervals of time and to use the sample mean to estimate the mean pollution level of the lake on each occasion.

– But with the attendant costs even of such a simple monitoring process it is desirable to keep the sample as small (and cheap) as possible to achieve the desired level of assurance.

Ranked-set Sampling

• Ranked-set sampling could operate in the following way:

• If we want to sample of size 5– We would choose five sites at random,

• but rather than measuring pollution at each of them we would ask a local expert which would be likely to give the largest value.

• Alternatively choose the candidate for highest value by a cheaply observed concomitant variable such as the opacity of the water.

• We then repeat the process by selecting a second random set of five sites

• A second expert to guide us• And seeking to measure the second largest pollution level amongst

these,• And so on, until we seek the lowest pollution level in the final random

set of five sites.

• The resulting ranked-set sample of size 5 is then used for the estimation of the mean.

• Such an approach can also be used to estimate a measure of dispersion, quantile or even to carry out a test of significance or to fit a regression model.

• The gain can be dramatic: the sample mean is unbiased and efficiencies relative to simple random sampling may reach 300%.

Ranked-set Sample Mean

)(

1

.........,,,

.....,,.........,

:

....,,.........,

....,,.........,

)(

)()2(2)1(1

21

22221

11211

XVarXVar

thatandunbiasedisXthatshownbecanIt

xn

x

asestimatedbeshouldmeanThe

xxx

asdefinedthenissamplesetrankedThe

xxx

xxx

xxx

ii

nn

nnnn

n

n

Sampling in the Wild

• Sampling methods which are particularly suitable for examining living things.

• Sampling techniques are: 1. quadrat sampling, 2. capture-recapture or mark-recapture, 3. transect sampling and 4. adaptive sampling

• Quadrat Sampling

• Mainly used for ecological studies

• If we wish to count the numbers of one, or of several, species of plant in a meadow (to estimate population size or assess biodiversity)

• We might throw a quadrat at random and do out count, or counts, within the boundary.

• For aquatic wildlife • we might cast a net of given size into a pond, river, or sea and count

what it trawls.• A quadrat is usually a square (or round) metal frame of a meter or several meters

side (or diameter)• Where it lands defines the search area in which we take appropriate measures of

numbers of individual plants, biomass or extent of ground cover.

Recapture Sampling

• A wide range of sampling methods are based on the principle of initially ‘capturing’ and ‘marking’ a sample of the members of a closed (finite) population and subsequently observing, in a later (or separate) independent random sample drawn from the population, how many marked individuals are obtained.

• The term capture-recapture is usually used for animals or insects, while mark-recapture is often reserved for when studying plants.

• The sample information is then used to infer characteristics of the overall population, principally its total size.

• In its simplest approaches to capture-recapture we assume randomness of the samples with constant capture probabilities in a fixed population (no births, deaths, etc.) and no capture-related effects (of being ‘trap shy’ or trap happy’ or marks being lost)

• The Petersen estimator of population size N:

Where n is initial sample size,

m is the second random sample size

m’ is the number contain originally marked individuals

The variance of is

mmnN /ˆ

N̂

3)(

))((~)ˆ(

m

mmmnmnNVar

TRANSECT SAMPLING

• Transect sampling methods also developed principally for biological applications, with the aim of estimating the density, or the number, of a species of animal, fish, or plant distributed over a geographic region.

• In its simplest form, known as line-transect sampling, a line is drawn at random across a search region and the objects (be they tigers) are sought by moving along the line and noting how many of the target objects are observed as one goes from one end of the line to the other.

• Several lines may be drawn and the sample data accumulated from traversing all the lines.

• Estimation of the abundance of animals or plants is known to be difficult and time-demanding, and line-transect sampling is as efficient and effective an approach as is likely to be found, for most types of problem.

• Assumptions:• To be stationary (i.e. not moving);• To be similarly and independently able to be observed;• To be seen at right angles to the transect line;• To be seen on one occasion only;• To be unaffected (e.g. neither repelled nor attracted) by the observer.

• Other possible modifications include the prospects of:• Observing objects other than right angles to the transect line; we may

see them ahead of us;• Taking into account the fact that larger objects may be more visible

than smaller ones, at any specific distance;• Deciding to take observations in all directions from a single point rather

than by traversing a transect line.

• The latter approach is known as a point-transect sampling

Point Transect Sampling

• Several points may be chosen and a period of time is then spent at each point recording all observed objects of the type sought.

• This can particularly useful approach for birds and elusive animals.

• The Simplest Case: Strip Transects• Suppose a line transect of length l is chosen at random, extending over an

observation region which contains individual specimens of some object of interest distributed at random with density

• Thus if we observe n objects, our estimate of the density of the objects over the region is

•

wl

n

2ˆ

ADAPTIVE SAMPLING

• In a number of sampling situations, field researchers carrying out the survey may feel an inclination to adaptively increase sampling effort in the vicinity of observed value that are high or otherwise interesting.

• Adaptive cluster sampling refers to design in which an initial set of units is selected by some probability sampling procedure (with or without replacement), and, when ever the variable of interest of a selected unit satisfies a given criterion, additional units in the neighborhood of that unit are added to the sample.

• For the sorts of situations in which field researchers feel the inclination to depart from the preselected sample plan and add nearby or associated units to the sample, adaptive cluster sampling accommodates that inclination almost completely.

• Consider a survey of a rare and endangered bird species in which observers record the number of individuals of the species seen or heard at sites or units within the study area.

• At many of the sites selected for observation, zero abundance may be observed.

• But whenever substantial abundance is encountered, observation of neighboring sites is likely to reveal additional concentration of individual of the species.

• Such patterns of clustering are encountered with many types animals from whales to insects, with vegetation types from trees to lichens and with mineral and fossil fuel resources.

• A related pattern is found in epidemiological studies of rare, contagious diseases.

• Whenever an infected individual is encountered, addition to the sample of closely associated individuals reveals a higher than expected rate.

Documents

Sampling and Monitoring of Environmental Data Md. Abdus Salam Professor Department of Statistics Jahangirnagar University Savar, Dhaka