29
il United States (~UJ] Department of Agriculture National Agricultural Statistics Service Research Division STB Research Report Number STB-94-02 December 1994 Application Of Satellite Data To Crop Area Estimation At The County Level Michael E. Bellow

il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

il~·United States(~UJ]Department ofAgriculture

NationalAgriculturalStatisticsService

Research Division

STB Research ReportNumber STB-94-02

December 1994

Application Of SatelliteData To Crop AreaEstimation At The CountyLevel

Michael E. Bellow

Page 2: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

APPLICATION OF SATELLITE DATA TO CROP AREA ESTIMATION AT THECOUNTY LEVEL, by Michael E. Bellow, Research Division, National Agricultural StatisticsService, U.S. Department of Agriculture, Washington, D.C., December 1994. NASS ResearchReport No. STB-94-02.

ABSTRACT

This report describes the use of earth resources satellite data by the National Agricultural StatisticsService (NASS) to estimate crop planted area at the county or small domain level. The Battese-Fuller random effects model has recently been applied to obtain acreage indications submitted toNASS State Statistical Offices in the Mississippi Delta Region for input into their setting ofofficial county estimates. This method extends the regression approach used for crop areaestimation at the State level by incorporating an additional term that accounts for county effects.An alternative method, known as pixel count estimation, uses overall counts of satellite pixelsclassified to different crops and ground cover types within a county. The two methods aredescribed and compared for several crops using Landsat Thematic Mapper (TM) data from Iowa(1988), Mississippi (1991-92) and Louisiana (1992). The satellite based estimates are comparedwith corresponding NASS official estimates.

KEY WORDS

Regression, Battese-Fuller, ratio, county effect

IIThis paper was prepared for limited distribution to the research community outside the U.S.iDepartment of Agriculture. The views expressed herein are not necessarily those of NASS orIUSDA.

ACKNOWLEDGEMENTS

The author thanks Mitch Graham, Martin Ozga and Bill Mason for their assistance with dataprocessing and analysis. Appreciation is due to the State Statistical Office staff members,NASDA enumerators and Remote Sensing Section support staff who performed various tasksassociated with collection and preparation of the data. Thanks to Mike Craig, Paul Cook and BillIwig for their review and helpful suggestions. Finally, the author thanks Wendy Waters for herperserverance in the word processing of this report.

i

Page 3: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

TABLE OF CONTENTS

SUMMARY ill

INTRODUCTION 1

BATTESE-FULLEREST~TION 3

PIXEL COUNT EST~TORS 6

EMPIRICAL EVALUATION 8

DISCUSSION AND RECOMMENDATIONS 17

REFERENCES 18

APPENDIX A: 20HUDDLESTON-RAY AND CARDENAS ESTIMATORS

APPENDIX B: 21EST~TION OF BATTESE-FULLER VARIANCE COMPONENTS

APPENDIX C: 21DERIVATION OF VARIANCE FORMULA FOR SYNTHETIC ESTIMATOR

APPENDIX D: 22DERIVATION OF VARIANCE FORMULA FOR COMBINED RATIO EST~TOR

APPENDIX E: 23DISTRIBUTION-FREE PROCEDURES FOR COMPARISON OF TREATMENTS

ii

Page 4: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

SUMMARY

The focus of this report is the use of data from earth resources satellites to improve estimation ofcrop planted area at the county or small domain level. Large scale sample surveys conducted bythe National Agricultural Statistics Service (NASS) for national and State level estimation areoften inadequate for estimation at the county level due to insufficient sample sizes within counties.This fact has led to investigation and usage of auxiliary data sources such as list frame controldata, previous year estimates and satellite data. NASS has studied the application of satellite datato crop area estimation at the county level since the mid 1970' s.

The Battese-Fuller model, a random effects model, is currently used by NASS for small domaincrop area estimation with combined survey and satellite data. The approach is based on theregression methodology used for estimation at the State level, but incorporates an additional termthat models county effects. Alternative estimators based on overall counts of classified satellitepixels have recently begun to receive consideration. Two such estimators are the raw pixel countestimator (RPCE) and the combined ratio estimator (CRE).

An empirical study using data from Iowa, Mississippi and Louisiana compared the Battese-Fullerestimator (BFE) with the RPCE, CRE and a survey based synthetic estimator (SYN).Distribution-free test procedures were used to evaluate estimator accuracy. Overall, the BFEappeared to be the most accurate of the four estimators, but the CRE had lower variance. Theperformance of estimators was influenced by the specific crop and region. The BFE and CREtended to be biased downward and the RPCE biased upward.

iii

Page 5: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

INTRODUCTION

The Nationa! Agricultural Statistics Servicehas published county level estimates of cropacreage, crop production, crop yield andlivestock inventories since 1917. Theseestimates are used by government agenciesfor local economic decision making and byagribusinesses for marketing and salespurposes. The primary source of data foragricultural commodity estimates has alwaysbeen surveys of farmers, ranchers andagribusinesses who voluntarily provideinformation on a confidential basis.However, surveys designed and conducted atthe national and State levels are ofteninadequate for producing reliable informationat the county or small domain level. Themajor obstacle to obtaining accurate smallarea estimates from large scale samplesurveys is the fact that such areas usuallycontain very few sample units. Therefore,supplementary data sources such as NASSlist frame control data, previous yearestimates and Census of Agriculture data areoften used to improve county estimates. Inaddition, special supplemental surveys areconducted by NASS State Statistical Offices(SSO's) for county estimation purposes.Earth resources satellite data, e.g., from theU.S. Landsat or French SPOT satellites,represents a useful ancillary data source forcounty level estimation of area planted in acrop. The potential for improved estimationaccuracy using satellite data is based on thefact that, with adequate coverage, all areawithin a county can be classified to crops andother ground cover types. The accuracy ofthe estimates depends to a large degree uponhow accurately the satellite data areclassified to each crop.

Beginning in 1972, NASS has pursued theuse of satellite data to improve crop area

1

estimation in general. From 1978-87, theAgency used Landsat Multispectral Scanner(MSS) data to provide timely State levelestimates of major crops in eight midwesternand south central States to NASS'sAgricultural Statistics Board (ASB). From1991-93, NASS used Landsat ThematicMapper (TM) data to estimate cotton, riceand soybean planted acreage in theMississippi Delta region. The Statesinvolved were Arkansas (1991-93),Louisiana (1992) and Mississippi (1991-92).Estimation at the State or region level isaccomplished using a first order regressionmodel relating ground survey data to satellitedata. A limitation of the use of satellite datais difficulty obtaining cloud free imagery fora given region, in particular the Delta areawhich gets frequent rain.

The basic element of satellite spectral data isthe set of measurements taken by a sensor ofa square area on the earth's surface. Thesensor measures the amount of radiantenergy reflected from the surface in selectedbands of the electromagnetic spectrum. Theindividual imaged areas, known as pixels,are arrayed along east-west rows within thenorth-ta-south pass (swath) of the satellite.For purposes of easy storage, the data withina swath are subdivided into overlappingsquare blocks called scenes. The Landsatsatellites image a given point on the earth'ssurface once every 16 days. The LandsatMultispectral Scanner (MSS) sensor containsfour spectral bands with a spatial resolutionof 80 meters. The more advanced LandsatTM sensor has seven bands (three visible andfour infrared) with 30 meter resolution. TheFrench SPOT Multispectral Scanner hasthree bands (two visible and one infrared)with 20 meter resolution.The remote sensing approach to crop areaestimation at the State, region and county

Page 6: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

levels depends upon the area frame portionof NASS's annual June Agricultural Survey(JAS). NASS's Area Frame Section dividesthe area of each State except Alaska into landuse strata depending on degree of cultivationand other factors (Cotter and Nealon, 1987;Bush and House, 1993). During the JAS,enumerators interview farm operators withinrandomly selected area sampling units calledsegments and record the responses onquestionnaire forms. At State StatisticalOffices, the completed questionnaires arecompiled and edited. The summarizedresults are used to generate State levelestimates of crop acreage with measurableprecision. These estimates are transmitted tothe Agricultural Statistics Board inWashington, D.C., which is responsible forsetting official estimates.

The number of satellite scenes required tocover a region of interest within a Statedepends upon the image area of the satellite.One cannot always have the same imagedates for all scenes due to satellite overpassschedule, cloud cover and image qualityfactors. Consequently, following the JASand scene acquisition, NASS's RemoteSensing Section (RSS) divides a State intosmaller areas called analysis districts. Ananalysis district is either a collection ofcounties and parts of counties completelycontained in one or more scenes having thesame image date or an area for which usablesatellite data are not available. State level(frrst domain) crop area estimates areobtained by summing all analysis districtlevel (second domain) estimates within theState. County level (third domain) estimatescan be computed using one of the methodsdescribed in this report. The Battese-Fullerapproach is currently favored.

2

All satellite based county crop areaestimators use stratum level or overall countsof pixels within a county classified tospecific crops. Three regression based smalldomain estimation methods have beenapplied or considered by RSS. From 1976-82, the Huddleston-Ray estimator(Huddleston and Ray, 1976) was used. In1978, the Cardenas family of estimators(Cardenas, Blanchard and Craig, 1978) wasconsidered but not implemented. AppendixA provides a mathematical description of theHuddleston-Ray and Cardenas estimators.From 1982-87, the Battese-Fuller estimatorwas applied for county level estimation ofmajor crops in the central United Statesusing Landsat Multispectral Scanner (MSS)data. The same method was used tocalculate county level estimates of rice,cotton, soybean and sugar cane acreage inthe Mississippi Delta region in 1991-93 withLandsat TM .data. Recently, non-regressionestimators based on overall counts ofclassified pixels have begun to be studied.Two such estimators are discussed later.

Most data processing associated with satellitebased crop area estimation is done usingPEDITOR, a special purpose softwaresystem developed at NASS (Ozga, Masonand Craig, 1992). PEDITOR is writtenmainly in PASCAL and maintained on aMicrovax 3500 computer, with manymodules that also run on personal computers.A Cray supercomputer has been used forcomputationally intensive tasks.

For each analysis district having usablesatellite coverage, a separate regressionestimator (Cochran, 1977) is applied tocompute crop area estimates that are moreprecise than the direct expansion estimatesobtained from JAS data alone. Allen (1990)provides a detailed description of the

Page 7: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

methodology involved. The steps requiredare as follows:

1. Register each satellite scene to a mapbase (Cook. 1982).

2. Label each pixel within the samplesegments to a crop or other ground covertype using JAS reported data.

3. Cluster sets of pixels corresponding todistinct cover types to create coversignatures, represented by discriminantfunctions (Bellow and Ozga, 1991).

4. Classify all pixels within the samplesegments to cover types using coversignatures.

5. Develop regression relationships betweenJAS reported crop area (dependentvariable) and corresponding counts ofpixels classified to a given crop(regressor variable) on a per stratumbasis.

6. Classify all pixels within the analysisdistrict to cover types.

7. Generate stratum level crop areaestimates by substituting analysis districtlevel classified pixel averages intoregression equations.

8. Sum stratum level estimates to obtainoverall analysis district level crop areaestimates.

The procedure uses direct expansionestimation in analysis districts lackingadequate satellite coverage. The final Statelevel estimate for each crop is a composite ofthe regression and direct expansion estimateswithin the State.

In many States. counties typically containfewer than five sampled JAS segments andmay contain no segments at all. This factmakes it generally infeasible to defmeanalysis districts to be individual countiesand use the above procedure to obtain county

3

level estimates. Instead, indirect estimatorsthat utilize information from outside a countynave been studied and applied.

While the word "county" is used in theupcoming discussion, the concepts apply toany small domain, Le.• an area for whichinsufficient ground survey information isavailable.

BATTESE-FULLER ESTIMATION

The Battese-Fuller approach to crop areaestimation at the county level is based uponthe regression methodology used for Statelevel estimation (Allen, 1990; Graham.1993). The Battese-Fuller estimator (BFE)uses the analysis district level regression, butinvokes an additional term that accounts forcounty effects.

The Battese-Fullermodel was first developedin the general framework of linear modelswith nested error structure (Fuller andBattese, 1973), and later applied to countycrop area estimation (Battese, Harter andFuller, 1988). As mentioned earlier, a set ofcounties and subcounties covered by one ormore satellite scenes of the same date formsan analysis district, within which stratumlevel regression relationships between surveyreported crop area and counts of classifiedpixels are developed. The Battese-Fullermodel assumes that segments grouped bycounty have the same slope parameter as theanalysis district, but a different intercept isrequired. The model can be applied withinan analysis district for all strata whereclassification and regression have been done.The analyst computes stratum level Battese-Fuller crop area estimates for all countiesand subcounties within the boundaries ofeach analysis district. For land use stratawhere regression cannot be done due to lack

_._--~----~~~------ ~--- ~_~_u~ _

Page 8: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

of adequate satellite coverage or too fewsegments, a domain indirect syntheticestimator that depends only on the groundsurvey data is used.

For a given analysis district, the strata whereregression is done are referred to asregression strata and the remaining ones assynthetic strata. For convenience, theregression strata are labeled h = 1,... ,Hr andthe synthetic strata h = Hr+1,... ,H, where Hris the number of regression strata and H isthe total number of strata in the analysisdistrict. If any given county is partiallycontained in the analysis district, then theestimation formulas given below apply to theincluded portion.

For each sample segment within a givenstratum h in county c, the Battese-Fullermodel specifies the following relation:

where:

"lte - number of sample segments instratum h, county c

Y Itel - reported area in crop of interestin stratum h, county c, samplesegment i

x Itel - number of pixels classified tocrop of interest in stratum h,county c, sample segment i

Y Ite - county effect in stratum h, countyc

Eitel - random error in stratum h, countyc, sample segment i

(a) lJel - total error in stratum h, county c,sample segment i

P Olt'P lit = analysis district level regressionparameters for stratum h

4

This formulation is recognized as a randomeffects model. The county effect vbe andrandom error Ehci are assumed to beindependent and normal, with mean zero andvariances 0

2"", and (J2 ell' respectively. The

total error has covariance structure:

if c*c'if c=c', i*i'if c=c', i=i'

The parameter (Jvb 2 is both a within countycovariance and a between county componentof the variance of any residual, while (Jell 2 isthe within county variance component forstratum h. The county mean residuals areobservable and given by:

where:IIlJc

Yhe. - (lIn,)LYhei1.1

IIIJc

= (llnlle)L Xheii.l

~ OIJ' ~ lit - least squares regressionparameter estim~ors forstratum h

For a given county, the stratum level meancrop area per population unit (segment) isestimated by:

-(BF) A A-

Y lie. = POII+PlhXhe+aheUhe.

where:

XlJe - mean number of pixels perpopulation unit classified to cropin stratum h, county c

o :!:: a he :!:: 1

Page 9: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

The mean square error of this estimator is:

The (unadjusted) stratum level estimator oftotal crop area in the county is:

where:

- number of population units instratum h, county c

The range of allowed values of 51K: defines afamily of Battese-Fuller estimators. If 51K:=

0, then the estimate lies on the analysisdistrict regression line for the stratum. Thevalue that minimi7:es the mean square errorfor stratum h in county i is (Walker andSigman, 1982):

In general, the variance components OytJ2 andOch2 are unknown and must be estimated.The estimators given in Appendix Brepresent a special case of the more generalunbiased estimators derived by Fuller andBattese (1973), using the "fitting-of-constants" method. They require that astratum contain at least two sample segmentswithin the county in question. If there arefewer than two segments, then 5hc is set tozero in the computation of the countyestimate.

The unadjusted estimates of county totalsgenerally do not sum to the correspondinganalysis district totals obtained from largescale estimation. In order to get agreement,adjustment terms must be added to theestimates. The formula for the adjustedBattese- Fuller estimator is:

5

- number of population units instratum h

The adjusted Battese-Fuller estimator of totalcrop area in the regression strata of county cis:

H,f(aBF) = ~ f(aBF)

e L- heh.l

Estimation of the variance of the BFE isdiscussed by Walker and Sigman (1982).Their estimator of mean square error is usedto derive the variance estimator, but isknown to have a downward bias due toestimation of the variance components. Thisbias can be significant and a correction dueto Prasad and Rao (1990) may beimplemented in the future.

The presence of a county main effect acrossstrata introduces cross strata covariance andrequires revisions in both the mean squareerror formula and the choice of an optimalset of multipliers for the county meanresiduals. Walker and Sigman (1982)developed an extension of the above modelthat requires estimation of a vector of countyeffects by strata.

As mentioned previously, syntheticestimation is done in strata where regressionis not viable. A given county usuallycontains few segments in a given stratum, sothe stratum level mean crop acreage persegment for the analysis district is used tocompute the synthetic estimates. In syntheticstratum h, the crop area in county c isestimated by:

f (SYN)he

Page 10: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

where:

Y It •. - mean reported crop area persample segment in stratum h

An estimator of the variance, derived inAppendix C, is the following:

H

v(f(SYN» = L Nh/sYh2(Nh-nh)/Nhnhh.H r')

The domain indirect synthetic estimator oftotal crop area in the synthetic strata ofcounty c is then:

H

- L Nhe Yh ..h.Hr·l

data for a six county region in eastern SouthDakota. At that time, NASS was using theHuddleston-Ray estimator (Appendix A).They found a modest lack of fit of themodel, with larger model departurecorresponding to low correlations betweenclassified pixel counts and ground surveyobservations. The county effect parameterwas found to be highly significant for com,the most prevalent of the four cropsconsidered in the research study.Furthermore, this effect manifested itselfwithin several strata but was negligibleacross strata. The study nonetheless showedrobustness of the Battese-Fuller estimatorsagainst departure from certain modelassumptions. Two members of the Battese-Fuller family satisfied the criterion for smallrelative root mean square error; i.e., lessthan 20 percent of the estimate was due toroot mean square error. These estimatorswere the ones that minimi7.edmean squareerror and bias, respectively, under the modelassumptions. However, the Battese-Fullerestimate closest to the Huddleston-Rayestimate was far less satisfactory, failing tomeet the desired upper limits for meansquare error and bias. This study providedthe justification for adopting the Battese-Fuller estimator as a replacement for theHuddleston-Ray estimator.

1 ~~ - 2- -- ~~(y"ct - Y,,)

n.-l t.1 c.1

= number of sample segments instratum h

S 2111

where:

"(SYN)T e(3.1)

Synthetic estimation's use of prorated districtlevel averages to estimate county totalsignores county effects. Therefore, thesynthetic component of a county estimate canhave a significant bias. The bias is reducedif a synthetic district has homogeneousagricultural intensity for the crop of interest.

The final county estimate is the sum of theregression and synthetic components:

f = f (aBF) + f (SYN)e e e

The estimated variance of the final countyestimates is computed by summing thevariance estimates of the regression andsynthetic components.

Walker and Sigman (1982) studied theBattese-Fuller model using Landsat MSS

PIXEL COUNT ESTIMATORS

As improved satellite sensors enable higherclassification accuracy, the overall (acrossstrata) count of pixels within an areaclassified to a given crop has become a moreinteresting number. The overall pixel countrepresents a "census" of pixels covering thearea in question and therefore is not subjectto sampling error. However, there is anonsampling error due to pixelmisclassification. As a result, the estimator

6

Page 11: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

defined as the overall pixel count convertedto area units can have a significant bi¥.Bias reduction is achieved by using anadjustment factor based on sample levelinformation. Although a pixel countestimator could be a function of counts ofpixels classified to many different covertypes, this discussion is restricted toestimators based on the number of pixelsclassified to the crop of interest only. Ageneral expression for such an estimator is:

from the crop of interest being classified toanother cover type).

The combined ratio estimator (CRE) is basedon the estimator of the same name describedin Cochran (1977). This estimator isconceptually simple, uses stratum levelinformation to compute the adjustment termand has a simple formula for estimating thevariance. The CRE can be expressed asfollows:

f c

where:

Xc - number of pixels classified to crop ofinterest in county c

" - adjustment term

The adjustment term could be a function ofthe sample level classification data. Thechoice of adjustment term determines thespecific estimator to be used. The raw pixelcount estimator (RPCE), alluded to above, isobtained by setting the adjustment term to thearea on the ground corresponding to a singlepixel:

f (RPe) = AXc c

where).. is the conversion factor (area unitsper pixel) for the satellite sensor in use.

-Rx c

where:

- mean number of pixels persample segment classified to cropin stratum h

Since the adjustment term is based on samplelevel information, this estimator has apositive sampling error. An estimator forthe variance of the CRE, valid for largesamples, is the following (see Appendix Dfor derivation):

where:

By considering only the specific pixelclassification obtained and not thesuperpopulation consisting of all possibleclassifications for a given data set, one canassume that the RPCE has zero variance.The bias depends upon the differencebetween the theoretical commission error(probability of a pixel from another covertype being classified to the crop of interest)and omission error (probability of a pixel

7

S 2Jell

III

Page 12: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

x - total number of pixels classifiedto crop in analysis district

EMPIRICAL EVALUATION

The performance of the satellite based countycrop area estimators was evaluated usingdata from Iowa, Louisiana and Mississippi.The Iowa data, from 1988, bad been usedpreviously for a sensor comparison study(Bellow, 1991). The Mississippi andLouisiana data came from NASS's 1991-1992 operational projects in the MississippiDelta region (Craig, 1993). Table 1 givesinformation about the four regions used inthe study. The number of scene dates for agiven region indicates whether the analysisfor that region was unitemporal (one date) ormultitemporal (two dates).

Region A is a crop reporting district inwestern Iowa with a high concentration ofcom and soybeans. Parts of Calhoun,

Table 1: Regions in Empirical Study

Crawford and Ida counties were outside theTM scene used. Region B is comprised oftwo contiguous crop reporting districts innorthwest Mississippi which accounted formost of the rice and cotton produced in theState in 1991 and 1992. The same areasampling frame was in effect in both years,allowing for a direct year-to-year comparisonof county estimates. The slight increase innumber of segments from 1991 to 1992 wasdue to NASS's annual rotation of segmentsinto and out of a State's area sample. Allareas except for a small part of yazoo countywere covered by the TM scenes. Region Cis a crop reporting district in southwestLouisiana that accounted for 56 percent ofthe State's rice production in 1992. RegionD is a crop reporting district in south centralLouisiana that accounted for 56 percent ofthe State's sugar cane production in 1992.Part of Vermilion parish was outside the TMscenes.

--------------- Number of --------------Region Location Year Counties Strata Segments Scenes Scene Dates

A W Iowa 1988 9 2 30 1 7/25B NW Mississippi 1991 12 6 79 4 4/1, 8/23

1992 12 6 82 4 5/5,7/24C SW Louisiana 1992 7 5 51 2 4/26, 8/16D SC Louisiana 1992 6 4 21 1 5/5

In each region, all available TM spectralbands were used. For Iowa, the analysisused all 30 segments, with 28 coming fromstratum 14 (agricultural) and the remainingtwo from stratum 30 (agri-urban). Datafrom the segments in stratum 14 were usedfor regression and Battese-Fuller estimation.For the BFE, CRE and RPCE, syntheticestimation was used within stratum 14outside the scene. For the BFE, syntheticestimation was used in stratum 30 for allareas.

8

In Mississippi, Battese-Fuller estimation wasused for cotton in strata 11 (75 - 100%cultivated), 12 (51 - 75 % cultivated), 20 (15- 50%) cultivated) and 40 (0 - 15%cultivated). For rice, the BFE was computedonly in strata 11 and 12 due to an insufficientnumber of segments with positive reportedrice acreage in the other strata. Syntheticestimation was used in the remaining stratafor each crop. In Louisiana, Battese-Fullerestimation was used in strata 13 (50 - 100%cultivated) and 20 (15 - 50% cultivated) forboth rice and sugar cane.

Page 13: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

Tables 2-5 give the computed values of thesatellite based BFE, CRE and RPCE forIowa, Mississippi and Louisiana,respectively. For comparison, values of thepure synthetic estimator (SYN) are alsoshown. This survey based estimator usessynthetic estimation in all strata and isdefined by equation (3.1) with Hr=O.Estimated standard deviations are shown forthe SYN, BFE and CRE. The official

county or parish acreage estimates (OFF)issued by NASS's Iowa, Mississippi andLouisiana State Statistical Offices are alsoshown. These published estimates arebenchmark data against which the accuracyof satellite based estimates can be assessed.Rice figures are given only for 1992 inIssaquena and Yazoo counties sinceMississippi did not issue official 1991 riceestimates for those two counties.

Table 2: County Estimates for Iowa 1988 (1000 Acres)

CORN:

County OFF SYN SD BFE SD CRE SD RPCEAudubon 100.0 112.4 6.5 92.2 3.2 93.6 2.1 100.6Calhoun* 133.0 144.9 8.3 133.2 3.9 134.4 2.9 144.2Carroll 141.0 146.2 8.4 141.4 4.5 142.1 3.1 152.6Crawford* 147.0 183.2 10.6 152.7 4.7 155.1 3.2 164.9Greene 125.0 145.9 8.4 130.0 3.9 132.8 2.9 142.7Guthrie 98.0 151.3 8.7 106.3 5.2 107.8 2.4 115.8lda* 112.0 111.4 6.4 107.0 4.0 107.0 3.8 110.3Sac 136.0 148.1 8.5 138.3 4.0 139.6 3.1 150.0Shelby 155.0 149.4 8.6 140.7 4.0 141.5 3.1 152.1

SOYBEANS:

County OFF SYN SD BFE SD CRE SD RPCEAudubon 70.7 74.0 7.5 69.9 4.6 70.4 2.1 74.8Calhoun* 150.0 95.4 9.6 145.0 5.8 136.9 4.0 145.2Carroll 117.0 96.1 9.7 106.7 9.7 106.4 3.1 113.0Crawford* 106.0 120.4 12.1 106.9 5.8 108.1 3.1 113.8Greene 143.0 96.1 9.7 117.5 5.4 109.6 3.2 116.3Guthrie 77.5 99.5 10.0 64.4 7.0 78.8 2.3 83.7lda* 75.2 73.3 7.4 76.4 5.3 76.1 4.3 78.2Sac 124.0 97.3 9.8 112.9 5.5 108.8 3.2 115.5Shelby 94.9 98.3 9.9 81.0 6.0 91.1 2.7 96.7

* - incomplete satellite coverage

9

_~ ~_. .~._m ~~~~~~~~~~~~~~~ __ ~ __ ~~ _

Page 14: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

Table 3: County Estimates for Mississippi 1991 (1000 Acres)

COTrON:

County OFF SYN SD BFE SD CRE SD RPCEBolivar 65.5 106.2 15.4 61.6 6.1 60.6 3.9 80.6Coahoma 105.7 59.2 8.4 88.3 4.2 82.6 5.2 109.8Humphreys 61.6 53.2 7.2 57.3 3.4 54.2 3.4 72.1Issaquena 38.0 42.6 8.6 34.6 3.9 27.5 1.8 36.6Leflore 79.2 68.8 9.6 87.8 3.5 83.4 5.3 111.0Quitman 31.0 48.1 7.2 46.4 4.0 44.5 2.8 59.3Sharkey 47.0 43.2 6.9 48.6 3.4 42.5 2.7 56.6Sunflower 100.0 95.6 15.0 79.3 5.5 73.9 4.7 98.3Tallahatchie 64.2 68.9 10.5 67.9 4.9 60.3 3.8 80.3Tunica 45.6 47.1 6.9 38.0 2.5 36.5 2.3 48.6Washington 95.7 84.4 11.6 102.4 4.0 93.2 5.9 124.1yazoo* 94.5 89.3 23.4 93.9 7.5 82.6 5.2 109.6

RICE:

County OFF SYN SD BFE SD CRE SD RPCEBolivar 74.0 50.8 11.9 66.2 3.6 66.9 6.1 60.9Coahoma 15.8 20.3 4.7 10.4 2.5 10.7 1.0 9.7Humphreys 3.6 22.8 5.2 7.1 2.3 4.7 0.4 4.3Leflore 16.6 30.7 7.1 19.4 3.6 17.3 1.6 15.8Quitman 9.6 24.4 5.6 9.3 2.8 6.2 0.6 5.6Sharkey 5.0 18.0 4.1 7.8 1.7 6.5 0.6 5.9Sunflower 36.0 51.1 12.0 37.8 3.5 36.7 3.4 33.4Tallahatchie 9.6 20.9 5.1 8.5 3.0 8.1 0.7 7.4Tunica 17.5 17.6 4.3 9.9 2.6 13.0 1.2 11.9Washington 30.5 39.6 9.0 22.6 3.5 28.0 2.6 25.4

* - incomplete satellite coverage

10

Page 15: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

Table 4: County Estimates for Mississippi 1992 (1000 Acres)

corrON:

County OFF SYN SD BFE SD CRE SD RPCEBolivar 66.0 117.8 15.5 81.6 7.1 75.8 4.1 95.3Coahoma 106.2 61.1 8.6 70.5 6.1 72.6 4.5 79.6Humphreys 63.5 58.2 7.6 43.4 4.8 44.9 3.0 58.5Issaquena 38.2 40.6 6.9 35.8 3.0 34.5 2.3 45.0Leflore 95.0 74.8 9.8 86.0 5.5 82.1 5.5 107.1Quitman 34.0 52.8 7.3 40.2 7.4 37.9 2.4 41.6Sharkey 53.0 44.2 5.9 54.2 3.9 50.4 3.4 65.7Sunflower 103.3 109.6 15.3 62.9 7.8 62.8 4.0 81.1Tallahatchie 73.0 69.7 10.7 57.3 6.6 58.5 3.3 65.3Tunica 44.0 47.8 6.6 31.9 4.3 33.6 2.1 36.9Washington 95.0 92.4 11.8 92.8 6.1 88.2 5.9 115.0yazoo* 98.5 82.8 23.5 88.0 4.1 83.3 5.7 106.7

RICE:

County OFF SYN SD BFE SD CRE SD RPCEBolivar 82.0 31.4 7.1 50.1 8.1 49.8 5.0 69.5Coahoma 18.0 15.5 4.0 6.5 2.5 6.4 0.2 7.7Humphreys 7.6 15.8 3.6 12.2 6.2 14.2 1.6 20.1Issaquena 4.1 8.8 2.4 7.2 4.5 17.0 2.0 24.2Leflore 18.0 21.2 5.0 17.2 7.9 24.4 2.8 34.7Quitman 18.5 15.8 3.8 6.6 2.6 3.8 0.1 4.6Sharkey 10.6 11.7 2.6 12.0 5.6 12.8 1.5 18.2Sunflower 44.0 30.5 7.0 37.4 7.8 35.0 3.7 49.1Tallahatchie 14.0 17.1 4.8 7.3 2.9 5.9 0.2 7.4Tunica 19.5 12.9 3.3 7.7 2.7 8.9 0.2 10.7Washington 30.5 25.7 5.7 28.1 8.0 39.7 4.6 56.4yazoo* 1.2 13.1 8.5 1.0 2.1 8.3 1.2 11.4

* - incomplete satellite coverage

11

Page 16: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

Table 5: Parish Estimates for Louisiana 1992 (1000 Acres)

RICE:

CountyAcadiaAllenBeauregardCalcasieuCameronJeff. DavisVermiliona

SUGAR CANE:

CountyAssumptionIberiaIbervilleLafayetteSt. MartinSt. Mary

OFF96.028.04.0

36.015.097.0

102.0

OFF35.053.731.4

8.929.542.0

SYN53.911.87.0

46.612.356.561.1

SYN7.7

12.89.2

13.115.211.2

SD8.71.91.47.42.09.3

10.0

SD1.83.02.13.03.62.6

BFE87.516.13.6

28.711.886.896.8

BFE24.454.828.612.321.351.4

SD1.70.40.51.10.31.33.5

SD2.02.01.31.82.61.5

CRE80.420.38.3

34.961.883.2

149.6

CRE21.545.623.710.325.539.4

SD3.00.80.31.42.63.59.9

SD2.24.62.41.02.64.0

RPCE92.324.710.142.475.1

101.1152.4

RPCE29.663.032.714.235.254.4

a - incomplete satellite coverageb - harvested area

Figures 1 and 2 show measured bias of theBFE, CRE, RPCE and SYN for com andsoybeans in Region A. The biasmeasurements are simply the county leveldifferences between each crop area estimateand the official estimate.

To determine if any of the four estimatorswere significantly different from the officialcounty estimates for these data sets, adistribution-free Friedman test was applied(Hollander and Wolfe, 1973). Appendix Egives a description of the procedure. Thesubjects were counties in a given data set,while the treatments were the five types of

estimate (OFF, BFE, RPCE, CRE, SYN).

The null hypothesis states that the treatmenteffects are equal, i.e., the five types ofestimate are not significantly different. Thishypothesis is rejected if the test statistic S issufficiently large. Table 6 gives therank sums and values of the Friedman Sstatistic for each of the eight data sets.Upper bounds on p-values, computed usinga X2

k:_l approximation to the null distributionof S, are also shown. The upper boundsshow that the null hypothesis of equaltreatment effects is rejected at the a = .1level for all eight data sets.

12

Page 17: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

Figure 1: Measured Bias For Iowa 1988 Corn60

(20)Audubon Calhoun Carroll Crawford Greene Guthrie

County

BFE CRE RPCE SYN___...•....• -B-(Based on official county estimates)

Ida Sac Shelby

Figure 2: Measured Bias For Iowa 1988 Soybeans

(60)Audubon Calhoun Carroll Crawford Greene Guthrie

County

I.::~~I(Based on official county estimates)

13

Ida Sac Shelby

Page 18: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

Table 6: Rank Sums and Friedman S Values

---------------- Rank Sums ---------------Stau Year Crop OFF BFE RPCE CRE SYN S p-valueIA 1988 Com 19 15 37 24 40 21.6 <.005IA 1988 Soybeans 31 22 37 21 24 8.27 <.1MS 1991 Cotton 38 35 56 18 33 24.6 <.005MS 1991 Rice 31 29 15 29 46 19.36 <.005MS 1992 Cotton 43 28 50 22 37 16.87 <.005MS 1992 Rice 38 25 49 31 37 10.67 <.05LA 1992 Rice 25 13 32 21 14 14.29 <.01LA 1992 Sugar Cane 20 19 29 13 9 15.47 <.005

Since the Friedman tests concludedsignificant differences for all eight data sets,distribution-free multiple comparison testswere performed to determine whichestimators, if any, were significantlydifferent from the official estimates. Thetreatments vs. control procedure described

in Appendix E was used, with the officialestimates playing the role of controltreatment. Table 7 gives the absolutedifferences between rank sums. The criticaltest values M(.05) and M(.01) at the .05 and.01 levels, respectively, are also shown.

Table 7: Rank Sum Absolute Differences (Four Estimators vs. OFF) and Critical TestValues

Rank SumAbsolute Differences

State Year Crop BFE RPCE CRE SYN M(.05) M(.Ol)IA 1988 Com 4 18 5 21 16.4 20.1IA 1988 Soybeans 9 6 10 7 16.4 20.1MS 1991 Cotton 3 18 20 5 18.9 23.2MS 1991 Rice 2 16 2 15 17.3 21.2MS 1992 Cotton 15 7 21 6 18.9 23.2MS 1992 Rice 13 11 7 1 18.9 23.2LA 1992 Rice 12 7 4 11 14.4 17.7LA 1992 Sugar Cane 1 9 7 11 13.4 16.4

The table shows that the RPCE wassignificantly different from OFF at the .05level for Iowa 1988 com, while the CRE wassignificantly different at the .05 level forMississippi cotton in both 1991 and 1992.The survey based estimator SYN wassignificantly different from OFF at the .01level for Iowa 1988 com. Of the fourestimators, the BFE was the only one foundnot significantly different from OFF for alleight data sets.

14

To quantify the differences betweenestimators and gain some insight into bias,Doksum contrast estimates were generated.These contrasts between treatments werecomputed using medians of pairwisedifferences between estimates. Theprocedure is described in Appendix E. Foreach data set, Table 8 gives the contrastestimates corresponding to differencesbetween treatment effects of the fourestimators and the treatment effect of the

Page 19: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

official estimates. The signs of the contrastestimates indicate direction of bias, i.e.,positive values signify overestimation andnegative values underestimation. The

magnitudes of the contrast estimates generatean accuracy ranking of the four estimatorsfor a given data set. These ranks are shownin parentheses in the table.

Table 8: Doksum Contrast Estimates - Four Estimators vs. OFF (rankings of magnitudesin parentheses)

State Year Crop BFEIA 1988 Com .34 (1)IA 1988 Soybeans -5.2 (3)MS 1991 Cotton -1.92 (1)MS 1991 Rice -.66 (1)MS 1992 Cotton -9.26 (3)MS 1992 Rice -4.05 (3)LA 1992 Rice -7.98 (2)LA 1992 Sugar Cane -1.9 (1)

The table shows that the contrast estimate ofthe BFE had the lowest magnitude for four ofthe eight data sets, and the second lowestmagnitude for one other data set. The CREhad the lowest magnitude contrast estimatefor one data set and the second lowest forfour others. SYN had the highest magnitudecontrast estimate for five data sets. Theseresults provide further evidence that none ofthe other three estimators is more accuratethat the BFE. The contrast estimate for theBFE was negative for seven data sets andthat of SYN for six, while the contrast

Contrast EstimatesRPCE CRE SYN

11.4 (3) 1.58 (2) 12.18 (4).62 (1) -4.24 (2) -5.38 (4)

12.2 (4) -7.26 (3) -2.47 (2)-2.88 (3) -1.55 (2) 11.24 (4)3.89 (2) -10.74 (4) -1.04 (1)4.95 (4) -1.93 (2) -.77 (1)9.44 (3) .82 (1) -20.78 (4)3.25 (2) -6.18 (3) -21.12 (4)

estimate for the RPCE was positive for sevendata sets. These observations suggestnegative bias tendencies for the BFE andSYN, and a positive bias tendency for theRPCE.

Tables 9-12 give four measures of estimatoraccuracy for all data sets, computed based onthe final official figures. The measures aremean deviation from official estimates (MD),root mean square deviation (RMSD), meanabsolute deviation (MAD) and largestabsolute deviation (LAD).

Table 9: Iowa 1988 Estimator Accuracy (1000 Acres)

Com SoybeansEST MD RMSD MAD LAD MD RMSD MAD LADBFE -0.6 6.8 5.4 14.3 -8.6 11.9 9.1 25.5RPCE 9.6 12.6 10.6 17.9 -2.3 10.3 7.4 26.7CRE 0.8 7.4 6.3 13.5 -8.0 13.5 9.0 33.4SYN 16.2 23.8 17.6 53.3 -12.0 28.0 21.6 54.6

15

Page 20: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

Table 10: Mississippi 1991 Estimator Accuracy (1000 Acres)

Cotton RiceEST MD RMSD MAD LAD MD RMSD MAD LADBFE -1.8 10.0 7.8 20.7 1.9 4.9 4.1 7.9RPCE 13.2 17.2 13.8 31.8 -3.8 5.4 4.1 13.1CRE -7.2 12.5 10.1 26.1 -2.0 3.5 2.8 7.1SYN -1.8 19.4 13.2 46.5 7.8 14.0 12.4 23.2

Table 11: Mississippi 1992 Estimator Accuracy (1000 Acres)

Cotton RiceEST MD RMSD MAD LAD MD RMSD MAD LADBFE -10.4 18.7 14.3 40.4 -6.2 11.4 7.7 31.9RPCE 2.3 16.0 13.8 29.3 3.8 13.8 12.5 25.9CRE -12.1 18.3 14.4 40.5 -3.5 13.0 10.9 32.2SYN -1.5 22.2 15.3 51.8 -4.0 16.0 9.4 50.6

Table 12: Louisiana 1992 Estimator Accuracy (1000 Acres)

Sugar Cane RiceEST MD RMSD MAD LAD MD RMSD MAD LADBFE -1.3 6.9 5.9 10.6 -6.7 7.6 6.7 11.9RPCE 4.8 7.4 6.6 12.4 17.2 29.9 19.2 60.1CRE -5.7 7.4 6.2 13.5 8.6 26.6 19.6 47.6SYN -21.9 26.1 23.3 40.9 -18.4 28.0 22.3 42.1

These figures tend to support the previousconjecture that none of the other threeestimators is more accurate than the BPE.For example, the root mean square deviationof the BFE was lower than that of the otherthree estimators for five of the eight datasets. The results also suggest region andcrop specific aspects to performance of thethree satellite based estimators. Toillustrate, the CRE showed much lowervalues of RMSD and MAD for rice inMississippi than for the same crop inLouisiana, while both the BFE and CREshowed noticeably lower values of the sametwo measures for com in Iowa than forsoybeans in the same State.

16

Table 5 shows that in the Louisiana parishesof Cameron and Vermilion, the RPCE andCRE severely overestimated the official riceacreage while the BFE was much moreaccurate. The reason for this anomaly is thefact that in stratum 40 (less than 15 percentcultivated) some wetland area wasmisclassified as rice in both parishes. Sincethe RPCE and CRE are computed from thepixel count in all strata combined, they wereboth grossly high for rice. The Battese-Fuller approach used synthetic estimation instratum 40 and thus circumvented thedifficulty. This situation indicates that onemust use caution when applying pixel basedestimation. Table 8 shows that the CRE hadthe lowest magnitude contrast estimate for

..~~---_._-~-_._------------------------------------

Page 21: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

the same data set. This result is misleadingsince the medians of paired differences werenot greatly influenced by the extremeoutliers. Table 12 reflects the situation moreaccurately, showing the BFE with a rootmean square deviation of 7,600 acrescompared with 29,900 acres for the RPCEand 26,600 acres for the CRE.

The mean deviation of the BFE was negativefor seven of the eight data sets, supportingthe earlier conjecture of negative bias. Table5 shows that for both rice in Region C andsugar cane in Region D, the BFE had alower variance than the CRE in mostparishes. However, that was not the case forthe Region A and B data sets.

The RPCE' s mean deviation was positive forsix of the eight data sets, supporting apositive bias. Surprisingly, this estimatorshowed the lowest RMSD and MAD forsoybeans in Region A and cotton in RegionB (1992).

There is insufficient evidence to make anystatement regarding direction of bias for theCRE. From Table 1, the CRE had lowervariance than the BFE for both com andsoybeans in all Region A counties.Furthermore, in most counties the CREshowed lower variance than the BFE for riceand cotton in Region B (1991 - 92).

The mean deviation of SYN was negative forsix data sets, supporting negative bias. Forboth cotton and rice in Region B (1991),SYN had higher variance than the BFE andCRE in each county. SYN did display lowervariance than the BFE in seven of twelvecounties for rice in Region B (1992).

17

DISCUSSION ANDRECOMMENDATIONS

Based on the results of the empirical study,the Battese-Fuller method should continue tobe used for county level crop area estimationwith satellite data. However, research onother estimators will continue. Estimatorsnot considered in this report, such as anindirect separate ratio estimator, could beinvestigated.

The future usage of satellite data for countycrop area estimation at NASS must beevaluated in the context of the Agency'soverall remote sensing and county estimationprograms. Remote sensing research willcontinue to focus on identifying newgeographic areas and crops where themethodology would be beneficial. Inaddition, benefits of other sources ofremotely sensed data, such as radar satellites,will be studied as data become available.From 1991-93, Landsat TM data were usedto produce State and county level crop areaestimates in the Delta region. In 1993,satellite data were used only for Arkansasdue to budgetary constraints. The RemoteSensing Section is using both French SPOTmultispectral data and Landsat TM data forcrop area estimation in Arkansas in 1994.This effort enables a comparison to be madebetween utility of the two sensors for thisapplication. As the prototype field office forremote sensing, the Arkansas SSO isreceiving special computing capabilities andtraining related to processing of earthresources satellite data. An empirical studycomparing Landsat based Arkansas countyestimates with various county indicationsused by the SSO is planned. There are 3-year (1991-93) time series available forremote sensing estimates and the other datasources. Furthermore, in 1995 there will beno full State project in Arkansas. Instead,NASS plans to perform crop acreage

Page 22: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

verification and crop condition/yieldassessment using extra area frame samplesegments in Craighead County. Multiplesatellite data sources and acquisition dateswill be utilized.

NASS State Statistical Offices rely on anumber of data series to help set officialcounty estimates of crop acreage andproduction (Iwig, 1993). Fairly reliableadministrative data sources are available formany commodities. The new NASS CountyEstimate System, developed in an effort tostandardize county estimation methods acrossStates, uses a combination of scaling andcompositing techniques to provide a countylevel total estimate for any agriculturalcommodity (Bass et al., 1989). Separateestimates that can be composited togetherinclude previous year official estimates,current year direct expansion and ratioestimates, Agricultural Stabilization andConservation Service (ASCS) figures andsatellite based estimates. The SSO' sconduct a large non-probability countyestimates survey that also provides updatedcontrol data for NASS's list sampling frame.This survey is an integral part of theAgency's overall survey program and willcontinue in some form in the foreseeablefuture.

There is an ongoing NASS cooperativeresearch project in small area estimation notinvolving satellite data. That project isevaluating various mixed effects models forestimating county level crop production fromthe State-wide non-probability sample offarms (Stasny, Goel and Rumsey, 1991). Acurrent research effort compares pixelestimators with stratum level regressionestimators for State and region level croparea estimation (Bellow, 1994). In additionto large domain versions of the RPCE andCRE, combined regression and separate ratioestimators are considered.

18

A byproduct of satellite data that is useful toState Statistical Offices is a set of colorcoded land use maps at the county level.These maps provide a pictorial view of thedistribution of crops within each county,based on the satellite pixel classifications.Recently, a capability to display roads andwater bodies on the maps using digital linegraph (DLG) data has been added. TheArkansas SSO provides feedback to NASS'sRemote Sensing Section concerning thesatellite based county estimates and maps.

REFERENCES

Allen, J. D. (1990), "A Look at the RemoteSensing Applications Program of theNational Agricultural Statistics Service, II

Journal of Official Statistics, 6(4), 393-409.

Bass, J., B. Guinn, B. Klugh, C. Ruckman,J. Thorson and J. Waldrop (1989),Re.port of the Task Group for Reviewand Recommendations on CountyEstimates, National AgriculturalStatistics Service, U.S. Department ofAgriculture.

Battese, G.E., R.M. Harter and W.A. Fuller(1988), IIAn Error-Components Modelfor Prediction of County Crop Areasusing Survey and Satellite Data," Journalof the American Statistical Association,83(401), 28-36.

Bellow, M.E. (1991), Comparison ofSensors for Com and Soybean PlantedArea Estimation, U.S. Department ofAgriculture, NASS Research Report No.SRB-91-Q2.

Bellow, M.E. (1994), "Large DomainSatellite Based Estimators of CropPlanted Area," to appear in AmericanStatistical Association 1994 Proceedin~s

Page 23: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

of the Section on Survey ResearchMethods.

Bellow, M.E. and M. Ozga (1991),"Evaluation of Clustering Techniques forCrop Area Estimation using RemotelySensed Data, " American StatisticalAssociation 1991 Proceedin~s of theSection on Survey Research Methods,pp. 466-471.

Bush, J. and C. House, (1993), "The AreaFrame: A Sampling Base forEstablishment Surveys," Proceedin~s ofthe International Conference onEstablishment Surveys, pp. 335-344.

Cardenas, M., M.M. Blanchard and M.E.Craig (1978), On the Development ofSmall Area Estimators Using LANDSATData as Auxiliary Information,Economics, Statistics and CooperativesServices, U.S. Department ofAgriculture.

Cochran, W.G. (1977), SamplingTechniques, New York, N.Y.: JohnWiley & Sons, pp. 26, 150 -204.

Cook, P.W. (1982), Landsat RegistrationMethodolo2V Used by U.S. Departmentof Al¢culture' s Statistical Reportin~Service. 1972-1982, Statistical ReportingService, U.S. Department of Agriculture.

Cotter, J. and J. Nealon, (1987), AreaFrame Design for Agricultural Surveys,National Agricultural Statistics Service,U.S. Department of Agriculture.

Craig, M.E. (1993), "The NASS DeltaProject: 1991-92 Rice and CottonAcreage, " ACSMI ASPRS AnnualConvention and Exposition.

19

Fuller, W.A. and G.E. Battese (1973),"Transformations for Estimation ofLinear Models with Nested-ErrorStructure," Journal of the AmericanStatistical Association, 68(343), 626-632.

Graham, M.L. (1993), "State Level CropArea Estimation using Satellite Data in aRegression Estimator," Proceedin~s ofthe International Conference onEstablishment Survevs, pp. 802-806.

Hollander, M. and D.A. Wolfe (1973),Nonparametric Statistical Methods, NewYork, N.Y.: John Wiley & Sons, pp.138- 162.

Huddleston, H.F. and R. Ray (1976), "ANew Approach to Small Area CropAcreage Estimation," Proceedings of theAnnual Meeting of the AmericanAgricultural Economics Association.

Iwig, W.C. (1993), "The NationalAgricultural Statistics Service CountyEstimates Program," Indirect Estimatorsin Federal Promuns, Statistical PolicyWorking Paper 21, Federal Committeeon Statistical Methodology,Subcommittee on Small Area Estimation,Ch.7.

Ozga, M, W.W. Mason and M.E. Craig(1992), "PEDITOR - Current Status andImprovements, " Proceedin2s of theASPRSI ACSM/RT 92 Convention, pp.175-183.

Prasad, N.G.N andJ.N.K. Rao (1990), "TheEstimation of the Mean Squared Error ofSmall-Area Estimators," Journal of theAmerican Statistical Association,85(409), 163-171.

----~-------~~~~~~~~~~~~~~~-~----------

Page 24: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

Stasny, E.A., P.K. Goel and 0.1. Rumsey(1991), "County Estimates of WheatProduction, " Survey Methodology,17(2), 211-225.

u.S. Department of Agriculture (1993),Area Frame Design Information, AreaFrame Section, NASS.

- mean number of pixels perpopulation unit classified to cropin stratum h, county c

- least squares regression slopeparameter estimator

The Huddleston-Ray estimator of total croparea in the regression strata of county c isthen:

where:

NAc = number of population units instratum h, county c

H, - number of regression strata

-= Yh.:Bh(Xhc-Xh)

where:

- (CAR)Yhc.

The Cardenas family of estimators(Cardenas, Blanchard and Craig, 1978) usesthe stratum level differences between meannumber of pixels classified to the crop ofinterest in the county and analysis district,respectively, to adjust the mean reportedcrop area per sample segment. Withinregression stratum h, the estimate of meancrop area per population unit for county c is:

Walker, G. and R. Sigman (1982), The Useof LANDSAT for County Estimates ofCrop Areas - Evaluation of theHuddleston-Ray and Battese- FullerEstimators, U.S. Department ofAgriculture, SRS Staff Report No. AGES820909 .

The Huddleston-Ray estimator (Huddlestonand Ray, 1976) replaces the classified pixelaverage for the analysis district with theclassified pixel average for a county whenestimating the county mean crop area perpopulation unit. Within an analysis district,the overall mean crop area in regressionstratum h is estimated by:

APPENDIX A: HUDDLESTON-RAYAND CARDENAS ESTIMATORS

- (HR)Yh ..

and the stratum level mean crop area forcounty c is estimated by:

where:

= parameter relating Classified pixelcounts to reported crop area

The estimate of total crop area m theregression strata of county c is:

- mean reported crop area persample segment in stratum h

- mean number of pixels persample segment classified to cropin stratum h

- mean number of pixels perpopulation unit classified to cropin stratum h

There are three alternative estimators of BA :

1. Ratio estimator

20

Page 25: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

2. Separate regression estimator

c

NhE nhC(Xhc-Xh)YhC.c.1

C

nhE Nhc(Xhc·X h)2e.1

3. Combined regression estimator

where:

ci" -

c ".LE (%lat:rXIu:)(Y1lt:/-YIu:)1:-1 Ll

C ".E E (%lat:rXlat:lc.l /.1

Ii" -

where:

Hr C

E (N,,1/n,/£ n"c(XIu:-X")Y,,c.11-1 c.l

Hr CEN"E N"c(X"c-X,,)111-1 c.l (The Section on Battese-Fuller estimation

defines all remaining terms used above).

N,. - number of population units instratum h

"11 - number of sample segments instratum h

"lIc = number of sample segments instratum h, county c

Yllc. - mean reported crop area persample segment in stratum h,countyc

The combined regression estimator isapplicable only if the B" 's are assumed to beconstant across strata.

APPENDIX B: ESTIl\.fATION OFBATTESE-FULLER VARIANCE

COMPONENTS

The estimators of the Battese-Fuller variancecomponents at the analysis district levelrepresent a special case of the more generalunbiased estimators derived by Fuller andBattese (1973). The estimators are given by:

c ".Ge/ = [l/(",..C-l)]L L lYIat:FYIat:.-cill(%flt:FXllc)t

c.l /.1

21

The value of 6f1t: that minimizes the meansquare of the Battese-Fuller estimator can beestimated by:

The stratum level mean square error of theestimator of mean crop area is estimated by:

- A. 1 2 A. 2 2mse(Y(BF),IIc) = (1-6 11) 0,,"'& IIc (0 ell 1"11)

Walker and Sigman (1982) provideexpressions for the mean square error andmean square conditional bias of the stratumlevel Battese-Fuller estimator. Separateformulas are required depending uponwhether the regression parameters are knownor estimated. Variance estimators arederived from these formulas.

APPENDIX C: DERIVATION OFVARIANCE FORMULA FORSYNTHETIC ESTIMATOR

In the Section on Battese-Fuller estimation,the strata where synthetic estimation wasdone were labeled h . H,.' 1,...,H, where H,.is the number of regression strata. The

--_ .._~-~---------------------------------------------

Page 26: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

2 -- Nlte V(YIt)

From Cochran (1977, p. 26), an estimator ofthe variance of the stratum level mean croparea per sample segment is:

domain indirect synthetic estimator of areaplanted in a given crop in the synthetic strataof county cis:

Hf(SYN) = ~

e L..J Nit; It ..It.Hr' I

The pure synthetic estimator corresponds toHr· o. The true variance of the estimatorfor a given stratum can be expressed as:

v(f(SYN) )Ite.

-- V(N lteY It)

H Hf(CR) = [(LN~It)l(LNh;It.)]X

It.I It.I

where:

Yit. - sample mean of main variable instratum h

xit. - sample mean of auxiliary variablein stratum h

X - overall population total ofauxiliary variable

For the combined ratio pixel COuntestimatorof small domain crop area, X is replaced byxc, the population total of the auxiliaryvariable in county c. Cochran's approximateformula, valid for large samples, is asfollows:

-v(yh)

where: where:

S 2ylt fit

Hence the variance hf the stratum levelsynthetic estimator can be estimated by:

v [f(SYN)lte.] - Nlte2vGh)

- NIt/SYh2(NIt - nlt)/Nltnlt

Summing over synthetic strata gives theresult:

S 2zit

S 2ylt

Szylt

APPENDIX D: DERIVATION OFVARIANCE FORMULA FOR

COMBINED RATIO ESTIMATOR

Cochran (1977, p. 166) gives an approximateformula for the variance of a combined ratioestimator of the form:

22

R = YIX

Xlti - true value of auxiliary variable instratum h, population unit i

Ylti = true value of main variable instratum h, population unit i

Y = overall population total of mainvariable

Xh = population mean of auxiliaryvariable in stratum h

Yit = population mean of main variablein stratum h

---------------------------------------------

Page 27: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

Replacing the true variances by their sampleestimators gives an estimator for the varianceof the domain direct combined ratioestimator:

where 8y1t2, 81111

2, and s. are defined in theSection on pixel count estimators. Thedomain indirect combined ratio estimator isobtained by multiplying the domain directestimator by (x/X), the county-to-regionratio of pixels classified to the crop ofinterest. Since x" and X are known, thisratio is a constant and the variance of theCRE is estimated by:

APPENDIX E: DISTRIBUTION-FREEPROCEDURES FOR COMPARISON

OF TREATMENTS

I. Friedman Rank Sum Test

For a two-way layout of n subjects and ktreatments, the Friedman rank sum testassumes the following model:

where xI} is the data value for subject i andtreatmentj, CI) is the unknown overall mean,PI is the subject i effect, 'CJ is the unknowntreatment j effect and e'l is a random error.The 'CJ 's are assumed to sum to zero. Thenull hypothesis of the Friedman rank sumtest is:

H (equaltreatmenteffects)0'"1 "2'" "k

Each subject's k data values are ranked fromsmallest to largest and the resulting ranks areaveraged within treatments. Friedman's Sstatistic is defined as:

k12n E (R .-R )2

k(k.l) j.l J ..

RI = sum of ranks for treatment jR J = ~/n(average rank for treatment j)R = (k+ 1)/2(average within-subject rank)

Table El shows the estimates, within-subjectrankings and rank sums for the Iowa 1988com data set.

Table El: Estimates and Within-Block Ranks (in Parentheses) for Iowa 1988 Com

COUNTY OFF BFE RPCE CRE 8YNAudubon 100 (3) 92.2 (1) 100.6 (4) 93.6 (2) 112.2 (5)Calhoun 133 (1) 133.2 (2) 144.2 (4) 134.4 (3) 144.9 (5)Carroll 141 (1) 141.4 (2) 152.6 (5) 142.1 (3) 146.2 (4)Crawford 147 (1) 152.7 (2) 164.9 (4) 155.1 (3) 183.2 (5)Greene 125 (1) 130.0 (2) 142.7 (4) 132.8 (3) 145.9 (5)Guthrie 98 (1) 106.3 (2) 115.8 (4) 107.8 (3) 151.3 (5)Ida 112 (5) 107.0 (1) 110.3 (3) 107.1 (2) 111.4 (4)Sac 136 (1) 138.3 (2) 150.0 (5) 139.6 (3) 148.1 (4)Shelby 155 ill 140.7 ill 152.1 ffi 141.5 m 149.4 illRank Sums 19 15 37 24 40

23

Page 28: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

For large n, the null distribution of S can beapproximated by the X2i-l distribution. Thenull hypothesis is rejected at level ex if S islarger than the upper a percentile point ofthe x 2

i-l distribution .

II. Multiple Comparison Test for Treatmentsvs. Control

The multiple comparison test of treatmentsvs. control computes the absolute differencesbetween the rank sums of the k-I treatmentsbeing evaluated (treatments 2, ... , k) and onecontrol treatment (treatment I). Anapproximate two-sided test is to concludethat a given treatment u is different from thecontrol treatment at the a level ofsignificance if:

IR. - R11 > Iml(u,k - 1,.S)[nk(k • 1)/6]112

where R. is the Friedman rank sum fortreatment u (u=l, ... ,k) and Iml (a, k-I,.5)is the upper a percentile point of thedistribution of the maximum absolute value

of k-I standard normal random variableswith common correlation 1/2. Table A.14 inHollander and Wolfe (1973) gives criticalvalues of this distribution for a = .01 and.05.

m. Doksum Estimates of TreatmentEffects and Differences

Doksum estimates of treatment effects andpairwise differences between treatmenteffects are computed as follows. The firststep is to generate k tables, with table jcontaining the within-subject differences IYjp= ~r ~p(i=I,. .. ,n; p=I, ... ,k). For eachtreatment, the median of pairwise differencesis then computed:

Z. = median (D 1. ,D 2. , •.•,D ft, )lP lP lP lP

Table E2 gives the pairwise differences andmedian pairwise differences associated withthe official estimates for the Iowa 1988 comdata set.

Table E2: Pairwise Differences from Official Estimates (Iowa 1988 Corn)

Difference from OFFCountyAudubonCalhounCarrollCrawfordGreeneGuthrieIdaSacShelbyMedian

OFF (1)0.00.00.00.00.00.00.00.00.00.0

BFE (2)7.8

-0.2-0.4-5.7-5.0-8.35.0

-2.314.3-0.4

RPCE (3)-0.6

-11.2-11.6-17.9-17.7-17.8

1.7-14.0

2.9-11.6

CRE (4)6.4

-1.4-1.1-8.1-7.8-9.85.0

-3.613.5-1.4

SYN (5)-12.4-11.9-5.2

-36.2-20.9-53.3

0.6-12.1

5.6-12.1

Estimates of treatment effects are thengenerated by averaging the mediandifferences associated with each treatment:

24

Ie

if = (l/k)LZfP' j.I, ...,kp.I

The contrasts or differences betweentreatment effects are estimated by:

Page 29: il~· - USDA · data, previous year estimates and satellite data. NASS has studied the application of satellite data to crop area estimation at the county level since the mid 1970

= i Ii p' j.1 ,...,k; p.1,...,k

Table E3 gives the estimated treatment

effects and contrasts for the Iowa 1988 comdata set.

Table E3: Estimated Treatment Effects and Constrasts for Iowa 1988 Com

Contrasts TreatmentTreatment OFF (1) BFE (2) RPCE (3) CRE (4) SYN (5) EffectOFF (1) 0.0 -0.34 -11.4 -1.58 -12.18 -5.1BFE (2) 0.34 0.0 -11.06 -1.24 -11.84 -4.76RPCE (3) 11.4 11.06 0.0 9.82 -0.78 6.3CRE (4) 1.58 1.24 -9.82 0.0 -10.6 -3.52SYN (5) 12.18 11.84 0.78 10.6 0.0 7.08

25