Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Ecology and Evolution 20199653ndash663 emsp|emsp653wwwecolevolorg
Received25April2018emsp |emsp Revised25September2018emsp |emsp Accepted2October2018DOI101002ece34789
O R I G I N A L R E S E A R C H
Accounting for preferential sampling in species distribution models
Maria Grazia Pennino1emsp|emspIosu Paradinas23emsp|emspJanine B Illian4emsp|emspFacundo Muntildeoz2emsp|emsp Joseacute Mariacutea Bellido5emsp|emspAntonio Loacutepez‐Quiacutelez2emsp|emspDavid Conesa2
ThisisanopenaccessarticleunderthetermsoftheCreativeCommonsAttributionLicensewhichpermitsusedistributionandreproductioninanymediumprovidedtheoriginalworkisproperlycitedcopy2018TheAuthorsEcology and EvolutionpublishedbyJohnWileyampSonsLtd
1InstitutoEspantildeoldeOceanografiacuteaCentroOceanograacuteficodeVigoVigoSpain2DepartamentďEstadiacutesticaiInvestigacioacuteOperativaUniversitatdeValegravenciaValenciaSpain3IparPerspectiveAsociacioacutenSopelaSpain4SchoolofMathematicsandStatisticsCentreforResearchintoEcologicalandEnvironmentalModelling(CREEM)UniversityofStAndrewsStAndrewsUK5InstitutoEspantildeoldeOceanografiacuteaCentroOceanograacuteficodeMurciaMurciaSpain
CorrespondenceMariaGraziaPenninoInstitutoEspantildeoldeOceanografiacuteaCentroOceanograacuteficodeVigoVigoSpainEmailgraziapenninoieoes
Funding informationEuropeanRegionalDevelopmentFundGrantAwardNumberMTM2013‐42323‐PMTM2016‐77501‐PandACOMP2015202
AbstractSpeciesdistributionmodels(SDMs)arenowbeingwidelyusedinecologyforman‐agementandconservationpurposesacrossterrestrialfreshwaterandmarinerealmsTheincreasinginterestinSDMshasdrawntheattentionofecologiststospatialmod‐elsandinparticulartogeostatisticalmodelswhichareusedtoassociateobserva‐tionsofspeciesoccurrenceorabundancewithenvironmentalcovariatesinafinitenumberoflocationsinordertopredictwhere(andhowmuchof)aspeciesislikelytobe present in unsampled locations Standard geostatisticalmethodology assumesthatthechoiceofsamplinglocationsisindependentofthevaluesofthevariableofinterestHowever in natural environments due to practical limitations related totimeandfinancialconstraintsthistheoreticalassumptionisoftenviolatedInfactdatacommonlyderivefromopportunisticsampling(egwhaleorbirdwatching)inwhichobserverstendtolookforaspecificspeciesinareaswheretheyexpecttofinditTheseareexamplesofwhatisreferredtoaspreferential samplingwhichcanleadtobiasedpredictionsofthedistributionofthespeciesTheaimofthisstudy istodiscussaSDMthataddressesthisproblemandthatitismorecomputationallyeffi‐cientthanexistingMCMCmethodsFromastatisticalpointofviewweinterpretthedataasamarkedpointpatternwherethesamplinglocationsformapointpatternand themeasurements taken in those locations (ie species abundanceoroccur‐rence)aretheassociatedmarksInferenceandpredictionofspeciesdistributionisperformedusingaBayesianapproachandintegratednestedLaplaceapproximation(INLA)methodologyandsoftwareareusedformodelfittingtominimizethecompu‐tationalburdenWeshowthatabundanceishighlyoverestimatedatlowabundancelocationswhenpreferentialsamplingeffectsnotaccountedforinbothasimulatedexampleandapracticalapplicationusingfisherydataThishighlightsthatecologistsshouldbeawareof thepotentialbias resulting frompreferential samplingandac‐countforitinamodelwhenasurveyisbasedonnon‐randomizedandornon‐sys‐tematicsampling
654emsp |emsp emspensp PENNINO Et al
1emsp |emspINTRODUC TION
An increasing interest in Species distribution models (SDMs) formanagementandconservationpurposeshasdrawntheattentionofecologiststospatialmodels(Dormannetal2007)SDMsarerele‐vantintheoreticalandpracticalcontextswherethereisaninterestinforexampleassessingtherelationshipbetweenspeciesandtheirenvironmentidentifyingandmanagingprotectedareasandpredict‐ingaspeciesrsquoresponsetoecologicalchanges(LatimerWuGelfandampSilander2006)Inallthesecontextsthemainissueistolinkinfor‐mationontheabundancepresenceabsenceorpresenceonlyofaspeciestoenvironmentalvariablestopredictwhere(andhowmuchof)aspeciesislikelytobepresentinunsampledlocationselsewhereinspace
Instudiesofspeciesdistributioncollectingdataonthespeciesofinterestisnotatrivialproblem([Keryetal2010)Withtheexcep‐tionofafewstudies(ThogmartinKnutsonampSauer2006)SDMsfrequentlyrelyonopportunisticdatacollectionduetothehighcostandtimeconsumingnatureofsamplingdatainthefieldespeciallyona largespatialscale(Keryetal2010) Indeed it isofteninfea‐sibletocollectdatabasedonawell‐designedrandomizedandorsystematicsamplingschemetoestimatethedistributionofaspecificspeciesover theentireareaof interest (BrotonsHerrandoampPla2007)HencevarioustypesofopportunisticsamplingschemesarecommonlyusedAsanexamplestudiesonseamammalscommonlyresorttotheaffordablepracticeofsamplingfromrecreationalboats(so‐called platforms of opportunity) whose bearings are neitherrandomnorsystematic(RodriacuteguezBrotonsBustamanteampSeoane2007)SimilarlybirddataareoftenderivedfromonlinedatabasessuchaseBirdwhichmakeavailablelocationsofbirdssightedbybird‐watcherswhotendtovisithabitatssuitableforinterestingspecies(httpsebirdorg)Alsointhecontextoffisheryecologyfishery‐dependent survey data are often derived from commercial fleetstendtobereadilyavailableforanalysisHoweverthefishingboatsnaturallytendtofishinlocationswheretheyexpectahighconcen‐trationoftheirtargetspecies(VasconcellosampCochrane2005)
AllthesetypesofopportunisticallycollecteddatatendtosufferfromaspecificcomplicationThesamplingschemethatdeterminessamplinglocationsisnotrandomandhencenotindependentoftheresponsevariableofinterestforexamplespeciesabundance(ConnThorsonampJohnson2017DiggleMenezesampSu2010)HoweverSDMstypicallyassumeifonlyimplicitlythatsamplinglocationsarenot informativeand that theyhavebeenchosen independentlyofwhatvaluesareexpectedtobeobservedinaspecificlocationThisassumption is typically violated for opportunistic data resulting inpreferentiallysampleddatacollectedinlocationsthatweredeliber‐atelychoseninareaswheretheabundanceofthespeciesofinterest
isthoughttobeparticularlyhighor lowThisviolationleadstobi‐asedestimatesandpredictions(Diggleetal2010)
Consequentlybiasedestimationandpredictionsofspeciesdis‐tributionleadtobadlyinformeddecisionmakingandtoinefficientorinappropriatemanagementofnaturalresources(Connetal2017Diggle etal 2010DinsdaleampSalibian‐Barrera 2018) This paperseeks to address preferential sampling in the context of fisheriesecologywherethis issue isparticularlyrelevantsincethe identifi‐cationandmanagementof sensitivehabitats (eg throughmarineprotected areas nurseries high‐discard locations) is a commonconservationtoolusedtosustainthelong‐termviabilityofspeciespopulations
Diggleetal (2010)suggestamodelingapproachthataccountsfor preferential sampling using likelihood‐based inference withMonteCarlomethodsHowevertheresultingapproachcanbecom‐putationallyintensiveastheauthorsrecognizeintheirreplytotheissuesraisedonthediscussionoftheirpaperwhichimpliesthatitisquitedifficult touse inpractical situations especiallywhen theobjectiveof theanalysis is topredict into inextendedareasThisisofsignificantconcernaspredictionisoftenthemainobjectiveofSDMsandpreferentialsamplingissuesarebytheirverydefinitionapracticalproblemthatneedstobeaddressedinaformthatmakesthemaccessibletousers
AsRueMartinoMondalandChopin(2010)indicateinthedis‐cussiononDiggleetalrsquospaperpreferentialsamplingmaybeseenasamarkedspatialpointprocessmodelinparticularamarkedlog‐GaussianCoxprocessThesemodelscanbefittedinacomputation‐ally efficient way using integrated nested Laplace approximation(INLA)andassociatedsoftwareRueMartinoandChopin(2009)inafastcomputationalway
In this studywe thoroughly explain themethodology for per‐formingpreferentialsamplingmodelsusingtheapproachproposedbyRueetal(2010)withinthecontextofSDMswiththefinalaimtoprovideguidanceontheappropriateuseandinterpretationofthefittedmodelsApracticalapplicationassessingthespatialdistribu‐tion of blue and red shrimp (Aristeus antennatus Risso 1816) fromfishingdataintheGulfofAlicante(Spain)isprovidedasatutorialwhilesimulateddataareusedtodemonstrateperformanceissuesofstandardmethodswhichdonotaccountforpreferentialsampling
2emsp |emspMATERIAL AND METHODS
Preferentiallycollecteddataconsistoftwopiecesofinformation(a)samplinglocationsand(b)measuredabundance(oroccurrence)oftargetspeciesintheselocationswheretheintensityofsamplinglo‐cationsispositivelyornegativelycorrelatedwithabundancethatis
K E Y W O R D S
BayesianmodellingintegratednestedLaplaceapproximationpointprocessesspeciesdistributionmodelsstochasticpartialdifferentialequation
emspensp emsp | emsp655PENNINO Et al
(a)and(b)arenotindependentPredictingthedistributionoftargetspeciesusingthistypeofdataimpliesthatthesamplingdistributionisnotuniformlyrandomasittendstohavemoreobservationswhereabundance ishigher and thusbasic statisticalmodelassumptionsareviolated
In order to overcome this problem Diggle etal (2010) pro‐posedto interpretthepreferentialsamplingprocessasamarked point processes modelThisapproachusestheinformationonsam‐plinglocationsandmodelsthemalongwiththemeasuredvaluesofthevariableofinterestinajointmodelInparticularsamplinglocationsareinterpretedasaspatial point patternaccountingfora higher point intensitywheremeasured values are higher Themeasurementstakenineachofthepoints(themarkinpointpro‐cessterminology)aremodeledalongwithandassumedtobepo‐tentiallydependentonthepoint‐patternintensityinajointmodelThisapproachaccountsforpreferentialsamplingwhilestillmak‐ingpredictionsforthevariableof interest inspace Inparticularinthefisheryexamplediscussedherefishinglocationsareinter‐pretedasapointpatternwhilethespeciescatchateachlocationisinterpretedasamark
21emsp|emspStatistical model
Formallyaspatialpointpatternconsistsofthespatial locationsofeventsorobjectsinadefinedstudyregionIllianPenttinenStoyanandStoyan (2008)Examples include locationsofspecies inapar‐ticularareaorparasitesinamicrobiologycultureSpatialpointpro‐cessesaremathematicalmodels(randomvariables)usedtodescribeandanalyze these spatialpatternsA simple theoreticalmodel foraspatialpointpattern is thePoissonprocessusuallydescribed intermsofitsintensityfunctionΛxThisintensityfunctionrepresentsthedistributionoflocations(ldquopointsrdquo)inspaceInaPoissonprocessthenumberofpointsfollowsaPoissondistributionandthelocationsofthesepointsindependentofanyoftheotherpointsThehomoge‐neousPoissonprocessrepresentscompletespatialrandomnessandservesasareferenceornullmodelinmanyapplications
Theintensityofapointprocessthatisthenumberofpointsperunitareamayeitherbeconstantoverspaceresultinginahomoge‐neousorstationarypatternorvaryinspacewithaspatialtrendre‐sultinginanon‐homogeneouspatternHowevertheassumptionofstationarityisgenerallyunrealisticinmostSDMapplicationsastheintensityfunctionvarieswiththeenvironmentmakingnon‐homo‐geneousPoissonprocessespotentiallyabetterchoice todescribespeciesdistributionbasedonatrendfunctionthatmaydependoncovariatesNonethelessinapplicationscovariatesmaynotexplaintheentirespatialstructureinaspatialpatternIncontrasttheclassofCoxprocessesprovidestheflexibilitytomodelaggregatedpointpatterns relative to observed and unobserved abiotic and bioticmechanismsHere spatial structures inanobservedpointpatternmayreflectdependenceonknownandmeasuredcovariatesaswellas on unknown or unmeasurable covariates or bioticmechanismsthat cannotbe readily representedby a covariate suchasdisper‐sallimitationIndeedthespatialstructureofbothabioticandbiotic
variablescan impactonecologicalprocessesandconsequentlybereflectedinthespeciesdistribution
Log‐GaussianCoxprocesses(LGCPs)areaspecificclassofCoxprocessmodelsinwhichthelogarithmoftheintensitysurfaceisaGaussianrandomfieldGiventherandomfieldMoreformally
where Vs is aGaussian random fieldGiven the random field theobservedlocationss = (s1hellipsn)areindependentandformaPoissonprocess
Inthecaseofpreferentiallysampledspeciestheobservedabun‐danceY = (y1hellipyn)isalsolinkedtotheintensityoftheunderlyingspatial field Inpracticeonlyvery fewsamplesmightbeavailableinsomeareasif it isassumedthatabundanceisparticularlylowinthese areas The LGCPmodel fitted to the sampling locations re‐flectsareaswithlowspeciesabundancethathaveresultedinareaswith fewer sampling locationsTo incorporate such information inthe SDM abundance model we apply joint modeling techniqueswhichallowfittingsharedmodelcomponentsinmodelswithtwoormorelinearpredictorsHereweconsidertwodependentpredictorswithtworesponsesthatistheobservedspeciesabundances(themarks)andtheintensityofthepointprocessreflectingsamplingin‐tensitythroughspace
This results in apreferential samplingmodel that consistsoftwo levelswhere information is sharedbetween the two levelsthemarkmodelandthepatternmodel Inparticular themarkY isassumedtofollowanexponentialfamilydistributionsuchasaGaussian lognormal or gamma distribution for continuous vari‐ablesoraPoissondistribution forcountdata Inall thesecasesthemeanμs isrelatedtothespatialtermthroughanappropriatelinkfunctionη
where prime0 istheinterceptofthemodelthecoefficientsprime
nquantify
theeffectofsomecovariatesXnontheresponseandWsisthespa‐tialeffectofthemodelthatistheGaussianrandomfieldTheco‐variatesintheadditivepredictorareenvironmentalfeatureslinkedtohabitatpreferencesofaspecies
InthesecondpartofthemodelanLGCPmodelwithintensityfunctionΛsreflectingthesamplinglocations
where β0istheinterceptoftheLGCPthecoefficientsβnquantifytheeffectofsomecovariatesXnontheintensityfunctionandWsisthespatialtermsharedwiththeLGCPbutscaledbyαtoallowforthedif‐ferencesinscalebetweenthemarkvaluesandtheLGCPintensities
Bayesianinferenceturnsouttobeagoodoptiontofitspatialhierarchicalmodelsbecauseitallowsboththeobserveddataandmodel parameters to be random variables ([Banerjee Carlin ampGelfand2004)resultinginamorerealisticandaccurateestima‐tionofuncertaintyAnimportantissueoftheBayesianapproachisthatpriordistributionsmustbeassignedtotheparameters(inourcaseβ0and
prime
0)andhyperparameters(inourcasethoseofthe
(1)log(Λs)=Vs
(2)(s)=0+
nXn+Ws
(3)Λs=exp
0+nXn+Ws
656emsp |emsp emspensp PENNINO Et al
spatialeffectW )involvedinthemodelsNeverthelessasusualinthiskindofmodels theresultingposteriordistributionsarenotanalyticallyknownandsonumericalapproachesareneeded
22emsp|emspFitting models with INLA
Model‐fitting methods based on Markov chain Monte Carlo(MCMC)canbevery time‐consuming for spatialmodels inpar‐ticularLGCPsNeverthelessLGCPsareaspecialcaseofthemoregeneralclassof latentGaussianmodelswhichcanbedescribedas a subclass of structured additive regression (STAR) models(Fahrmeir amp Tutz 2001) In these models the mean of the re‐sponsevariableis linkedtoastructuredpredictorwhichcanbeexpressedintermsoflinearandnon‐lineareffectsofcovariatesInaBayesianframeworkbyassigningGaussianpriorstoallran‐domtermsinthepredictorweobtainalatentGaussianmodelAsaresultwecandirectlycomputeLGCPmodelsusingIntegratednested Laplace approximation (INLA) INLAprovides a fast yetaccurate approach to fitting latentGaussianmodels andmakesthe inclusion of covariates andmarked point processesmathe‐maticallytractablewithcomputationallyefficientinference(Illianetal2013SimpsonIllianLindgrenSoslashrbyeampRue2016)
23emsp|emspPreferential and non‐preferential models
Asmentionedabovetheresultingpreferentialmodelcanbeex‐pressedasatwo‐partmodelasfollowsAssumingthattheob‐servedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp
0+Ws
wehavetoassignadistributionfortheabundanceBasedonthefactthattheabundanceisusuallyapositiveoutcomewehave considereda gammadistributionalthough clearly other options could be possible (exponentiallognormalorPoissonforcounts)Thisyields
wheretheGaussianrandomfieldWslinkstheLGCPandtheabundanceprocessscaledbyα inthePoissonprocesspredictortoallowfordif‐ferencesinscaleThematrixQ() isestimatedimplicitlythroughanapproximation of the Gaussian random field through the stochasticpartialdifferentialequation(SPDE)approachasinLindgrenRueandLindstroumlm(2011)andSimpsonetal(2016)
Itisworthnotingthatnottakingaccountofpreferentialsamplingleads to biased results This can be easily seen by comparing thepreferentialsamplingapproachwiththefollowingsimplermodel
Notethatthemodel inEquation5assumesthat for therandomfieldswehaveZneWwhereasthepreferentialsamplingmodelassumesasinglesharedrandomfieldforboththepointpatternandthemarkForbothmodelspriordistributionshavetobechosenfortoallthepa‐rametersandhyperparametersWehaveassignedvaguepriorsthatisusedthedefaultinR‐INLAduetoagenerallackofpriorinformationAnotherapproachthatcouldbeusedhereispenalizedcomplexitypri‐ors (hereafterpcpriors) asdescribed inFuglstadSimpson LindgrenandRue(2017)andreadilyavailableinR‐INLA
Asmentionedabovebothmodelsin4and5mayincludecovari‐atesthesignificanceofwhichmaybetestedthroughmodelselectionproceduresThereisverylittleliteratureavailableonmodelselectionforpointprocessmodelshowevercriterialikethedevianceinforma‐tioncriterion(DIC)(SpiegelhalterBestCarlinmampVanDerLinde2002)aresometimesused
24emsp|emspSimulated example
In order to illustrate the effectiveness of the preferential samplingmethodandtoemphasizethemisleadingresultswewouldobtainifwedonottakeitintoconsiderationweinitiallyconsiderasimulationstudy(Figure1)OnehundredrealizationsofaGaussianspatialrandomfieldwithMateacuterncovariance functionweregeneratedovera100‐by‐100grid using theRandomFields package (SchlatherMalinowskiMenckOestingampStrokorb2015)
Foreachofthe100simulatedGaussianspatialrandomfieldsthatrepresentthedistributionofaspecies in thestudyarea twosetsof100sampleswerereproducedonedistributedpreferentiallyandonedistributed randomly In particular preferentially locations were se‐lectedusingrelativeprobabilitiesproportionaltotheintensityfunctionΛsAbundanceestimateswereextractedattheselectedlocationsasasimulationoftheobservationsforthisexperiment
Both preferential and non‐preferential models were fitted toeachof thesamplesetsandcompared in theirperformanceusing
(4)
YssimGa (s)
log (s)=0+Ws
WsimN (0Q ())
(2 log log )simMN(ww)
(5)
YssimGa(s)
log(s)=0+Zs
ZsimN(0Q())
(2 log log )simMN(zz)
F I G U R E 1 emspRepresentationofoneoftheonehundredGaussianfieldsimulatedandtherespectivepreferentiallysamplinglocationsgenerated
08
08
06
06
04
04
02
02
000000
000002
000004
000006
000008
000010
000012
000014
000016
emspensp emsp | emsp657PENNINO Et al
three different criteria the deviance information criterion (DIC)(Spiegelhalteretal2002)the log‐conditionalpredictiveordinates(LCPO)(RoosampHeld2011)andthepredictivemeanabsoluteerror(MAE)(WillmottampMatsuura2005)SpecificallytheDICmeasuresthecompromisebetween the fit and theparsimonyof themodeltheLCPOisaldquoleaveoneoutrdquocross‐validationindextoassessthepre‐dictivepowerof themodelandtheMAE indicates thepredictionerror Lower valuesofDIC LCPOandMAE suggestbettermodelperformance Finally a sensitivity analysiswithdifferent pcpriorswas carriedout to assess their influenceon the final inferenceoftherangeandvarianceofasimulatedGaussianspatialrandomfield
25emsp|emspSpatial distribution of blue and red shrimp in the Western Mediterranean Sea
In this section we illustrate how preferential sampling can beaccounted for in theconcretedataexample fromthecontextoffisherydataFishery‐dependentdataderivedfromopportunisticsamplingonboats fromthecommercial fleetpresentastandardexample of preferential sampling since clearly fishers preferen‐tiallyfishinareaswheretheyexpecttofindlargeamountsoftheirtargetspecies
Weconsiderdataonblueand redshrimp (Aristeus antennatus Risso 1816)(Carbonelletal2017DevalampKapiris2016Lleonart2005)oneofthemosteconomicallyimportantdeep‐seatrawlfish‐eryintheWesternMediterraneanSeaThedatawerecollectedbyobserversonboardanumberoffishingboatsintheGulfofAlicante(Spain)from2009to2012Thedatasetincludes77haulsfromninedifferenttrawlingvessels(Figure2)andwasprovidedbytheInstituto Espantildeol de Oceanografiacutea(IEOSpanishOceanographicInstitute)
Asmentionedthefittedeffectsfortheabundanceofblueandredshrimp(Y)arecorrectedbyjointlyfittingamodelfortheabun‐dancesandthesampling(fishing)locationsreflectingfishersrsquo(poten‐tially)impreciseknowledgeofthedistributionofblueandredshrimpWealsoassumethattheobservedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp
0+df(d)+wWs
wheref(d) represents thenotnecessarily linear relationshipwithoneco‐variate inthiscasebathymetryThereasonunderneaththisselec‐tionwasthatexploratoryanalysisrevealednon‐linearrelationshipsbetweendepthandblueandredshrimpabundanceTheremainingsecondpartofthemodelthatistheoneexplainingtheabundancealsocontainstherelationshipwiththebathymetry
where s indexes the locationofeachhauland j indexesdifferentdepths(djrepresentingthedifferentvaluesofbathymetryobservedinthestudyareafromd1=90mtpdm =40=920m)
WeuseaBayesiansmoothingsplineFahrmeirandLang(2001)tomodelnon‐lineareffectsofdepthusingasecond‐orderrandomwalk(RW2)latentmodel
Asnopriorinformationontheparametersofthemodelwasavail‐ablewe used a vague zero‐meanGaussian prior distributionwith avarianceof100forthefixedeffectsRegardingthespatialeffectpri‐orsonμκandμτwereselectedsothatthemedianpriorrangeofthiscomponentwashalfof thestudyareaand itspriormedianstandarddeviationwas1Itwasonlyinthecaseofthebathymetrythatavisualpre‐selectionofpriorswasmadetoavoidoverfittingbychangingtheprioroftheprecisionparameterwhilethemodelswerescaledtohaveageneralizedvarianceequalto1(SoslashrbyeampRue2014)Inanycaseallresultingposteriordistributionsconcentratedwellwithinthesupportofthepriorsselected
Notethateachpredictorhasitsownintercept(0prime0)butbathy‐metricf(d)andspatialeffectsWsaresharedinbothpredictorsAlsoboththebathymetricandthespatialeffectsarescaledbyαdandαwrespectivelytoallowforthedifferencesinscalebetweenblueandredshrimpabundancesandtheLGCPintensities
Itisalsoworthnotingthatρisaparameteroftheentiremodelthat reflects the global variability of the response variable As al‐readymentionedbeforeinapreferentialsamplingsituationobser‐vationsaremorefrequentwhereabundancevaluesarelikelytobehigherandthisfactcouldaffectthe inferenceabouttheρparam‐eter In linewith this apossibleextension toour globalmodelingapproachcouldbeamodelthatcouldtakeintoaccountthistypeofvariabilitybylinkingρtotheabundancemeasurements
ModelcomparisonwasperformedusingtheDICandLCPOcri‐teriaFinallyinordertotestthepredictionperformanceofthefinalpreferentialandnon‐preferentialmodelswecalculatedthePearsoncorrelation (r) index between the predicted abundance estimatesand an external database of observed abundance values in thesametimeperiod (2009ndash2012)This independentdataset includes
(6)
YssimGa(s)
log(s)=0+ f(d)+Ws
WsimN(0Q())
(2 log log )simMN(ww)
Δdj=djminusdj+1simN(0d) j=1hellip m
dsimLogGamma(400001)
F I G U R E 2 emspStudyareaandsamplinglocations(hauls)ofblueandredshrimp(Aristeus antennatus)Thesizeofthedotsrepresentstheamountcaughtineachofthelocations
50
100
200
200
200
400 400
700
700
700
700
1000
1500
1500
2000
2000
200
0
200
0
658emsp |emsp emspensp PENNINO Et al
fishery‐independentdatacollectedduringtheMEDITS(EU‐fundedMEDIterranean Trawl Survey) The MEDITS is carried out fromspringtoearlysummer(ApriltoJune)everyyearintheareaanditusesarandomsampling(Penninoetal2016)
3emsp |emspRESULTS
31emsp|emspSimulated example
Resultsobtainedinthissimulationstudyshowthatnottakingintoconsiderationpreferentialsamplingleadstomisleadingresultsspe‐ciallyatlow‐abundanceareas
Figure3showsthedifferenceinDICLCPOandMAEscoresbetweenpreferentialandnon‐preferentialmodels forbothsam‐ples the ones distributed preferentially and the one distributedrandomlyInparticularmorethan75oftheDICLCPOandMAEvaluesobtainedinthe100simulatedGaussianfieldsusingprefer‐entialmodelswere lowerthantheonesobtainedwithnon‐pref‐erentialmodels
In addition as it can be appreciated in the example shown inFigure4whichpresentstheresultsofoneoftheonehundredsim‐ulationsforexplanatorypurposesevenifnoneofthemodelswas
able tomake optimal predictions at low abundance locations thenon‐preferentialmodelperformedsignificantlyworse
Similarly Figure5which shows the posterior predictivemeanofoneofthesimulatedabundanceprocesseswithoutandwiththepreferentialsamplingcorrection((a)and(b)respectively))illustratesthatalthoughbothmodelshavesimilarpredictivespatialpatternsthepreferentiallycorrectedmodelpredictsbetteratmoderate‐to‐lowabundanceareas
FinallyFigure6showshowalldifferentpriorsofthehyperpa‐rameters of the spatial field end up fitting posterior distributionsthatareallconcentratedwithintherealvaluesResultsclearlyindi‐catethatslightchangesinthepriorsarenotanissuewhendealingwiththepreferentialsamplingmodels
32emsp|emspDistribution of blue and red shrimp in the Western Mediterranean Sea
All possiblemodels derived from 6were run Among them themost relevant results are presented in Table1While analyzingthedataweobservedthatboththebathymetricandthespatialtermsof the LGCPaccounted for approximately the same infor‐mation As a consequence full models did not converge in the
F I G U R E 3 emsp Improvementofthepreferentialmodelagainstaconventionalmodelinmodelfitscores(DICLCPOandMAE)Comparisonisbasedon100preferentiallyandrandomlysampleddatasetsNotethatpositivevaluesrepresentanimprovementonmodelfitandviceversa
minus40
minus20
0
Preferential data Random data
Type of data
Impr
ovem
ent i
n ab
unda
nce
mod
elDIC
minus03
minus02
minus01
00
01
Preferential data Random data
Type of data
LCPO
minus25
minus20
minus15
minus10
minus05
00
Preferential data Random data
Type of data
MAE
emspensp emsp | emsp659PENNINO Et al
point‐patternprocesswhichrestrictedthemodelcomparison inTable1tocorrectingonlyoneoftheeffectseitherthebathymet‐ricorthespatialeffect
Thebestmodel (basedon theDICandLCPO)was theprefer‐ential one with an shared spatial effect (ieModel 5 in Table1)ThesecondmostrelevantmodelintermisDICandLCPOwasthe
F I G U R E 4 emspSimulatedabundanceagainstpredictedabundanceinthenon‐preferentialmodel(left)andinthemodelwiththepreferentialcorrection(right)foroneoftheonehundredsimulationsperformedThenon‐preferentialmodelpredictsworsethanthepreferentialmodelatlow‐abundanceareas
0 1 2 3 4 5
02
46
810
Simulated field
Non
minuspr
efer
entia
l pre
dict
ion
0 1 2 3 4 50
24
68
10
Simulated field
Pre
fere
ntia
l pre
dict
ion
F I G U R E 5 emspPosteriorpredictivemeanmapsofoneoftheonehundredsimulatedabundanceprocesseswithout(left)andwith(right)thepreferentialsamplingcorrection
20
25
30
35
40
45
50
55
20
25
30
35
40
45
50
55
60
(a) (b)
660emsp |emsp emspensp PENNINO Et al
preferential one that included in addition to the spatial effect asharedbathymetriceffect(ieModel8inTable1)
Figure7showsthemeanoftheposteriordistributionofthespa‐tialeffectinthemodelwithoutandwithpreferentialsamplingwhileFigure8illustratestheposteriorpredictivemeanoftheblueandredshrimp distribution without and with the preferential correctionBothfiguresshowasimilarpatternIndeeditisclearthatthespatialoutputsobtainedwiththepreferentialmodelbetterabsorbthevari‐abilityofthespecieshabitatprovidingamorenaturalpatternoftheblueandredshrimpdistribution
Finally formodel validation the final preferentialmodel ob‐tained a reasonably high value for Pearsons r (r=047) in thecross‐validationwiththeMEDITSdatasetwithrespecttothenon‐preferential one (r=022) It isworthmentioning thatMEDITSsareperformedonlyinspringsummerandfewsamplingsarecar‐riedoutinthestudyareaThesefindingsfurtherhighlighthowthe
correctionofthepreferentialmodelisimportanttoreflecttherealdistributionofaspecies
4emsp |emspDISCUSSION
Inthispaperwepresentedamodelingapproachthatcouldbeveryusefulformodelingthedistributionofspeciesusingopportunisticdataandacquiringin‐depthknowledgethatcouldbeessentialforthecorrectmanagementofnaturalresourcesSpatialecologyhasadirectappliedrelevancetonaturalresourcemanagementbutitalsohasabroadecologicalsignificanceAlthoughitmaybecompli‐catedtodefinetheboundariesofspecieshabitatscombinedwithanefficientmanagementthatrecognizestheimportanceofsuchareasthisrepresentsthefirststeptowardfacilitatinganeffectivespatialmanagement However as shown by our results using anon‐accurate approach could culminate inmisidentification of aspecieshabitatanduncertainpredictionsandsoininappropriatemanagementmeasureswhichcansometimesbeirreversible
Theresultsofthepracticalapplicationonblueandredshrimpincludedhereasarealworldscenarioshowthatpredictivemapssignificantly improve the prediction of the target specieswhenthemodelaccountsforpreferentialsamplingIndeeditisknownthatthesuitablebathymetricrangeoftheblueandredshrimpintheMediterraneanseaisbetween200and200meters(GuijarroMassutiacuteMorantaampDiacuteaz2008)thatisreflectedbythepreferen‐tialmodelpredictionmapIncontrastthenon‐preferentialmodelpredictionwasnotabletocapturetherealdistributionofthespe‐ciesInadditionifamanagementmeasuresuchascreationofamarineconservationareashouldbeappliedonthebasisof thenon‐preferentialmodel it isclearthat itwouldnotbeappropri‐ateThiscouldresultinextremelylargeareabeingrecommended
TA B L E 1 emspModelcomparisonfortheabundanceoftheblueandredshrimp(Aristeus antennatus)basedonDICLCPOandcomputationaltimes
Model DIC LCPO Times (s)
1 Intc+Depth +19 +005 3
2 Intc+Spatial +9 minus 24
3 Intc+Depth+Spatial +11 +001 57
4 Intc+Depth +52 +024 21
5 Intc+Spatial minus minus 171
6 Intc+Spatial+Depth +5 +002 212
7 Intc+Depth+Spatial +10 +001 2275
8 Intc+Depth+Spatial +3 +003 3470
NotesDICandLCPOscoresarepresentedasdeviationsfromthebestmodelIntcInterceptBoldtermssharedcomponents
F I G U R E 6 emspSensitivityanalysisofthepcpriordistributionsfortherangeandvarianceofasimulatedspatialfieldDashedlinesrepresentpriordistributionssolidlinesposteriordistributionsandverticallinestherealvaluesofeachofthehyperparametersofthespatialfieldrangeintheleftpanelandvarianceintherightpanelRangepriorsweresetsothattheprobabilityofhavingarangesmallerthan10203040and50ofthemaximumdistanceofthestudyareawas025Similarlypriorsoverthevarianceofthespatialfieldweresetsothattheprobabilityofhavingavariancehigherthan2345and6was01
05 10 15
00
05
10
15
20
25
Matern range
x
y
PosteriorPriorReal
PosteriorPriorReal
15 10 5
000
005
010
015
020
025
030
Matern variance
x
y
(a) (b)
emspensp emsp | emsp661PENNINO Et al
for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies
Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)
Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults
Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies
AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)
This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing
F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations
minus04
minus03
minus02
minus01
00
01
02
03
04
05
minus3
minus2
minus1
0
1
2
3
4
5
6
(a) (b)
F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations
25
30
35
40
45
minus4
minus3
minus2
minus1
0
1
2
3
4
5
(a) (b)
662emsp |emsp emspensp PENNINO Et al
parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence
Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges
ACKNOWLEDG MENTS
D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare
CONFLIC T OF INTERE S T
NoneDeclared
AUTHOR CONTRIBUTIONS
AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata
DATA ACCE SSIBILIT Y
Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit
R E FE R E N C E S
BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC
Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x
CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453
ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803
DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A
Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x
Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress
DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x
Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229
FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6
FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8
GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799
GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003
Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184
Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017
IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons
KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic
emspensp emsp | emsp663PENNINO Et al
observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x
LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609
Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x
LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64
Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281
PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062
ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods
RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x
RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609
Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x
RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223
SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25
Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064
SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004
SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353
Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2
VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska
WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079
How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789
654emsp |emsp emspensp PENNINO Et al
1emsp |emspINTRODUC TION
An increasing interest in Species distribution models (SDMs) formanagementandconservationpurposeshasdrawntheattentionofecologiststospatialmodels(Dormannetal2007)SDMsarerele‐vantintheoreticalandpracticalcontextswherethereisaninterestinforexampleassessingtherelationshipbetweenspeciesandtheirenvironmentidentifyingandmanagingprotectedareasandpredict‐ingaspeciesrsquoresponsetoecologicalchanges(LatimerWuGelfandampSilander2006)Inallthesecontextsthemainissueistolinkinfor‐mationontheabundancepresenceabsenceorpresenceonlyofaspeciestoenvironmentalvariablestopredictwhere(andhowmuchof)aspeciesislikelytobepresentinunsampledlocationselsewhereinspace
Instudiesofspeciesdistributioncollectingdataonthespeciesofinterestisnotatrivialproblem([Keryetal2010)Withtheexcep‐tionofafewstudies(ThogmartinKnutsonampSauer2006)SDMsfrequentlyrelyonopportunisticdatacollectionduetothehighcostandtimeconsumingnatureofsamplingdatainthefieldespeciallyona largespatialscale(Keryetal2010) Indeed it isofteninfea‐sibletocollectdatabasedonawell‐designedrandomizedandorsystematicsamplingschemetoestimatethedistributionofaspecificspeciesover theentireareaof interest (BrotonsHerrandoampPla2007)HencevarioustypesofopportunisticsamplingschemesarecommonlyusedAsanexamplestudiesonseamammalscommonlyresorttotheaffordablepracticeofsamplingfromrecreationalboats(so‐called platforms of opportunity) whose bearings are neitherrandomnorsystematic(RodriacuteguezBrotonsBustamanteampSeoane2007)SimilarlybirddataareoftenderivedfromonlinedatabasessuchaseBirdwhichmakeavailablelocationsofbirdssightedbybird‐watcherswhotendtovisithabitatssuitableforinterestingspecies(httpsebirdorg)Alsointhecontextoffisheryecologyfishery‐dependent survey data are often derived from commercial fleetstendtobereadilyavailableforanalysisHoweverthefishingboatsnaturallytendtofishinlocationswheretheyexpectahighconcen‐trationoftheirtargetspecies(VasconcellosampCochrane2005)
AllthesetypesofopportunisticallycollecteddatatendtosufferfromaspecificcomplicationThesamplingschemethatdeterminessamplinglocationsisnotrandomandhencenotindependentoftheresponsevariableofinterestforexamplespeciesabundance(ConnThorsonampJohnson2017DiggleMenezesampSu2010)HoweverSDMstypicallyassumeifonlyimplicitlythatsamplinglocationsarenot informativeand that theyhavebeenchosen independentlyofwhatvaluesareexpectedtobeobservedinaspecificlocationThisassumption is typically violated for opportunistic data resulting inpreferentiallysampleddatacollectedinlocationsthatweredeliber‐atelychoseninareaswheretheabundanceofthespeciesofinterest
isthoughttobeparticularlyhighor lowThisviolationleadstobi‐asedestimatesandpredictions(Diggleetal2010)
Consequentlybiasedestimationandpredictionsofspeciesdis‐tributionleadtobadlyinformeddecisionmakingandtoinefficientorinappropriatemanagementofnaturalresources(Connetal2017Diggle etal 2010DinsdaleampSalibian‐Barrera 2018) This paperseeks to address preferential sampling in the context of fisheriesecologywherethis issue isparticularlyrelevantsincethe identifi‐cationandmanagementof sensitivehabitats (eg throughmarineprotected areas nurseries high‐discard locations) is a commonconservationtoolusedtosustainthelong‐termviabilityofspeciespopulations
Diggleetal (2010)suggestamodelingapproachthataccountsfor preferential sampling using likelihood‐based inference withMonteCarlomethodsHowevertheresultingapproachcanbecom‐putationallyintensiveastheauthorsrecognizeintheirreplytotheissuesraisedonthediscussionoftheirpaperwhichimpliesthatitisquitedifficult touse inpractical situations especiallywhen theobjectiveof theanalysis is topredict into inextendedareasThisisofsignificantconcernaspredictionisoftenthemainobjectiveofSDMsandpreferentialsamplingissuesarebytheirverydefinitionapracticalproblemthatneedstobeaddressedinaformthatmakesthemaccessibletousers
AsRueMartinoMondalandChopin(2010)indicateinthedis‐cussiononDiggleetalrsquospaperpreferentialsamplingmaybeseenasamarkedspatialpointprocessmodelinparticularamarkedlog‐GaussianCoxprocessThesemodelscanbefittedinacomputation‐ally efficient way using integrated nested Laplace approximation(INLA)andassociatedsoftwareRueMartinoandChopin(2009)inafastcomputationalway
In this studywe thoroughly explain themethodology for per‐formingpreferentialsamplingmodelsusingtheapproachproposedbyRueetal(2010)withinthecontextofSDMswiththefinalaimtoprovideguidanceontheappropriateuseandinterpretationofthefittedmodelsApracticalapplicationassessingthespatialdistribu‐tion of blue and red shrimp (Aristeus antennatus Risso 1816) fromfishingdataintheGulfofAlicante(Spain)isprovidedasatutorialwhilesimulateddataareusedtodemonstrateperformanceissuesofstandardmethodswhichdonotaccountforpreferentialsampling
2emsp |emspMATERIAL AND METHODS
Preferentiallycollecteddataconsistoftwopiecesofinformation(a)samplinglocationsand(b)measuredabundance(oroccurrence)oftargetspeciesintheselocationswheretheintensityofsamplinglo‐cationsispositivelyornegativelycorrelatedwithabundancethatis
K E Y W O R D S
BayesianmodellingintegratednestedLaplaceapproximationpointprocessesspeciesdistributionmodelsstochasticpartialdifferentialequation
emspensp emsp | emsp655PENNINO Et al
(a)and(b)arenotindependentPredictingthedistributionoftargetspeciesusingthistypeofdataimpliesthatthesamplingdistributionisnotuniformlyrandomasittendstohavemoreobservationswhereabundance ishigher and thusbasic statisticalmodelassumptionsareviolated
In order to overcome this problem Diggle etal (2010) pro‐posedto interpretthepreferentialsamplingprocessasamarked point processes modelThisapproachusestheinformationonsam‐plinglocationsandmodelsthemalongwiththemeasuredvaluesofthevariableofinterestinajointmodelInparticularsamplinglocationsareinterpretedasaspatial point patternaccountingfora higher point intensitywheremeasured values are higher Themeasurementstakenineachofthepoints(themarkinpointpro‐cessterminology)aremodeledalongwithandassumedtobepo‐tentiallydependentonthepoint‐patternintensityinajointmodelThisapproachaccountsforpreferentialsamplingwhilestillmak‐ingpredictionsforthevariableof interest inspace Inparticularinthefisheryexamplediscussedherefishinglocationsareinter‐pretedasapointpatternwhilethespeciescatchateachlocationisinterpretedasamark
21emsp|emspStatistical model
Formallyaspatialpointpatternconsistsofthespatial locationsofeventsorobjectsinadefinedstudyregionIllianPenttinenStoyanandStoyan (2008)Examples include locationsofspecies inapar‐ticularareaorparasitesinamicrobiologycultureSpatialpointpro‐cessesaremathematicalmodels(randomvariables)usedtodescribeandanalyze these spatialpatternsA simple theoreticalmodel foraspatialpointpattern is thePoissonprocessusuallydescribed intermsofitsintensityfunctionΛxThisintensityfunctionrepresentsthedistributionoflocations(ldquopointsrdquo)inspaceInaPoissonprocessthenumberofpointsfollowsaPoissondistributionandthelocationsofthesepointsindependentofanyoftheotherpointsThehomoge‐neousPoissonprocessrepresentscompletespatialrandomnessandservesasareferenceornullmodelinmanyapplications
Theintensityofapointprocessthatisthenumberofpointsperunitareamayeitherbeconstantoverspaceresultinginahomoge‐neousorstationarypatternorvaryinspacewithaspatialtrendre‐sultinginanon‐homogeneouspatternHowevertheassumptionofstationarityisgenerallyunrealisticinmostSDMapplicationsastheintensityfunctionvarieswiththeenvironmentmakingnon‐homo‐geneousPoissonprocessespotentiallyabetterchoice todescribespeciesdistributionbasedonatrendfunctionthatmaydependoncovariatesNonethelessinapplicationscovariatesmaynotexplaintheentirespatialstructureinaspatialpatternIncontrasttheclassofCoxprocessesprovidestheflexibilitytomodelaggregatedpointpatterns relative to observed and unobserved abiotic and bioticmechanismsHere spatial structures inanobservedpointpatternmayreflectdependenceonknownandmeasuredcovariatesaswellas on unknown or unmeasurable covariates or bioticmechanismsthat cannotbe readily representedby a covariate suchasdisper‐sallimitationIndeedthespatialstructureofbothabioticandbiotic
variablescan impactonecologicalprocessesandconsequentlybereflectedinthespeciesdistribution
Log‐GaussianCoxprocesses(LGCPs)areaspecificclassofCoxprocessmodelsinwhichthelogarithmoftheintensitysurfaceisaGaussianrandomfieldGiventherandomfieldMoreformally
where Vs is aGaussian random fieldGiven the random field theobservedlocationss = (s1hellipsn)areindependentandformaPoissonprocess
Inthecaseofpreferentiallysampledspeciestheobservedabun‐danceY = (y1hellipyn)isalsolinkedtotheintensityoftheunderlyingspatial field Inpracticeonlyvery fewsamplesmightbeavailableinsomeareasif it isassumedthatabundanceisparticularlylowinthese areas The LGCPmodel fitted to the sampling locations re‐flectsareaswithlowspeciesabundancethathaveresultedinareaswith fewer sampling locationsTo incorporate such information inthe SDM abundance model we apply joint modeling techniqueswhichallowfittingsharedmodelcomponentsinmodelswithtwoormorelinearpredictorsHereweconsidertwodependentpredictorswithtworesponsesthatistheobservedspeciesabundances(themarks)andtheintensityofthepointprocessreflectingsamplingin‐tensitythroughspace
This results in apreferential samplingmodel that consistsoftwo levelswhere information is sharedbetween the two levelsthemarkmodelandthepatternmodel Inparticular themarkY isassumedtofollowanexponentialfamilydistributionsuchasaGaussian lognormal or gamma distribution for continuous vari‐ablesoraPoissondistribution forcountdata Inall thesecasesthemeanμs isrelatedtothespatialtermthroughanappropriatelinkfunctionη
where prime0 istheinterceptofthemodelthecoefficientsprime
nquantify
theeffectofsomecovariatesXnontheresponseandWsisthespa‐tialeffectofthemodelthatistheGaussianrandomfieldTheco‐variatesintheadditivepredictorareenvironmentalfeatureslinkedtohabitatpreferencesofaspecies
InthesecondpartofthemodelanLGCPmodelwithintensityfunctionΛsreflectingthesamplinglocations
where β0istheinterceptoftheLGCPthecoefficientsβnquantifytheeffectofsomecovariatesXnontheintensityfunctionandWsisthespatialtermsharedwiththeLGCPbutscaledbyαtoallowforthedif‐ferencesinscalebetweenthemarkvaluesandtheLGCPintensities
Bayesianinferenceturnsouttobeagoodoptiontofitspatialhierarchicalmodelsbecauseitallowsboththeobserveddataandmodel parameters to be random variables ([Banerjee Carlin ampGelfand2004)resultinginamorerealisticandaccurateestima‐tionofuncertaintyAnimportantissueoftheBayesianapproachisthatpriordistributionsmustbeassignedtotheparameters(inourcaseβ0and
prime
0)andhyperparameters(inourcasethoseofthe
(1)log(Λs)=Vs
(2)(s)=0+
nXn+Ws
(3)Λs=exp
0+nXn+Ws
656emsp |emsp emspensp PENNINO Et al
spatialeffectW )involvedinthemodelsNeverthelessasusualinthiskindofmodels theresultingposteriordistributionsarenotanalyticallyknownandsonumericalapproachesareneeded
22emsp|emspFitting models with INLA
Model‐fitting methods based on Markov chain Monte Carlo(MCMC)canbevery time‐consuming for spatialmodels inpar‐ticularLGCPsNeverthelessLGCPsareaspecialcaseofthemoregeneralclassof latentGaussianmodelswhichcanbedescribedas a subclass of structured additive regression (STAR) models(Fahrmeir amp Tutz 2001) In these models the mean of the re‐sponsevariableis linkedtoastructuredpredictorwhichcanbeexpressedintermsoflinearandnon‐lineareffectsofcovariatesInaBayesianframeworkbyassigningGaussianpriorstoallran‐domtermsinthepredictorweobtainalatentGaussianmodelAsaresultwecandirectlycomputeLGCPmodelsusingIntegratednested Laplace approximation (INLA) INLAprovides a fast yetaccurate approach to fitting latentGaussianmodels andmakesthe inclusion of covariates andmarked point processesmathe‐maticallytractablewithcomputationallyefficientinference(Illianetal2013SimpsonIllianLindgrenSoslashrbyeampRue2016)
23emsp|emspPreferential and non‐preferential models
Asmentionedabovetheresultingpreferentialmodelcanbeex‐pressedasatwo‐partmodelasfollowsAssumingthattheob‐servedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp
0+Ws
wehavetoassignadistributionfortheabundanceBasedonthefactthattheabundanceisusuallyapositiveoutcomewehave considereda gammadistributionalthough clearly other options could be possible (exponentiallognormalorPoissonforcounts)Thisyields
wheretheGaussianrandomfieldWslinkstheLGCPandtheabundanceprocessscaledbyα inthePoissonprocesspredictortoallowfordif‐ferencesinscaleThematrixQ() isestimatedimplicitlythroughanapproximation of the Gaussian random field through the stochasticpartialdifferentialequation(SPDE)approachasinLindgrenRueandLindstroumlm(2011)andSimpsonetal(2016)
Itisworthnotingthatnottakingaccountofpreferentialsamplingleads to biased results This can be easily seen by comparing thepreferentialsamplingapproachwiththefollowingsimplermodel
Notethatthemodel inEquation5assumesthat for therandomfieldswehaveZneWwhereasthepreferentialsamplingmodelassumesasinglesharedrandomfieldforboththepointpatternandthemarkForbothmodelspriordistributionshavetobechosenfortoallthepa‐rametersandhyperparametersWehaveassignedvaguepriorsthatisusedthedefaultinR‐INLAduetoagenerallackofpriorinformationAnotherapproachthatcouldbeusedhereispenalizedcomplexitypri‐ors (hereafterpcpriors) asdescribed inFuglstadSimpson LindgrenandRue(2017)andreadilyavailableinR‐INLA
Asmentionedabovebothmodelsin4and5mayincludecovari‐atesthesignificanceofwhichmaybetestedthroughmodelselectionproceduresThereisverylittleliteratureavailableonmodelselectionforpointprocessmodelshowevercriterialikethedevianceinforma‐tioncriterion(DIC)(SpiegelhalterBestCarlinmampVanDerLinde2002)aresometimesused
24emsp|emspSimulated example
In order to illustrate the effectiveness of the preferential samplingmethodandtoemphasizethemisleadingresultswewouldobtainifwedonottakeitintoconsiderationweinitiallyconsiderasimulationstudy(Figure1)OnehundredrealizationsofaGaussianspatialrandomfieldwithMateacuterncovariance functionweregeneratedovera100‐by‐100grid using theRandomFields package (SchlatherMalinowskiMenckOestingampStrokorb2015)
Foreachofthe100simulatedGaussianspatialrandomfieldsthatrepresentthedistributionofaspecies in thestudyarea twosetsof100sampleswerereproducedonedistributedpreferentiallyandonedistributed randomly In particular preferentially locations were se‐lectedusingrelativeprobabilitiesproportionaltotheintensityfunctionΛsAbundanceestimateswereextractedattheselectedlocationsasasimulationoftheobservationsforthisexperiment
Both preferential and non‐preferential models were fitted toeachof thesamplesetsandcompared in theirperformanceusing
(4)
YssimGa (s)
log (s)=0+Ws
WsimN (0Q ())
(2 log log )simMN(ww)
(5)
YssimGa(s)
log(s)=0+Zs
ZsimN(0Q())
(2 log log )simMN(zz)
F I G U R E 1 emspRepresentationofoneoftheonehundredGaussianfieldsimulatedandtherespectivepreferentiallysamplinglocationsgenerated
08
08
06
06
04
04
02
02
000000
000002
000004
000006
000008
000010
000012
000014
000016
emspensp emsp | emsp657PENNINO Et al
three different criteria the deviance information criterion (DIC)(Spiegelhalteretal2002)the log‐conditionalpredictiveordinates(LCPO)(RoosampHeld2011)andthepredictivemeanabsoluteerror(MAE)(WillmottampMatsuura2005)SpecificallytheDICmeasuresthecompromisebetween the fit and theparsimonyof themodeltheLCPOisaldquoleaveoneoutrdquocross‐validationindextoassessthepre‐dictivepowerof themodelandtheMAE indicates thepredictionerror Lower valuesofDIC LCPOandMAE suggestbettermodelperformance Finally a sensitivity analysiswithdifferent pcpriorswas carriedout to assess their influenceon the final inferenceoftherangeandvarianceofasimulatedGaussianspatialrandomfield
25emsp|emspSpatial distribution of blue and red shrimp in the Western Mediterranean Sea
In this section we illustrate how preferential sampling can beaccounted for in theconcretedataexample fromthecontextoffisherydataFishery‐dependentdataderivedfromopportunisticsamplingonboats fromthecommercial fleetpresentastandardexample of preferential sampling since clearly fishers preferen‐tiallyfishinareaswheretheyexpecttofindlargeamountsoftheirtargetspecies
Weconsiderdataonblueand redshrimp (Aristeus antennatus Risso 1816)(Carbonelletal2017DevalampKapiris2016Lleonart2005)oneofthemosteconomicallyimportantdeep‐seatrawlfish‐eryintheWesternMediterraneanSeaThedatawerecollectedbyobserversonboardanumberoffishingboatsintheGulfofAlicante(Spain)from2009to2012Thedatasetincludes77haulsfromninedifferenttrawlingvessels(Figure2)andwasprovidedbytheInstituto Espantildeol de Oceanografiacutea(IEOSpanishOceanographicInstitute)
Asmentionedthefittedeffectsfortheabundanceofblueandredshrimp(Y)arecorrectedbyjointlyfittingamodelfortheabun‐dancesandthesampling(fishing)locationsreflectingfishersrsquo(poten‐tially)impreciseknowledgeofthedistributionofblueandredshrimpWealsoassumethattheobservedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp
0+df(d)+wWs
wheref(d) represents thenotnecessarily linear relationshipwithoneco‐variate inthiscasebathymetryThereasonunderneaththisselec‐tionwasthatexploratoryanalysisrevealednon‐linearrelationshipsbetweendepthandblueandredshrimpabundanceTheremainingsecondpartofthemodelthatistheoneexplainingtheabundancealsocontainstherelationshipwiththebathymetry
where s indexes the locationofeachhauland j indexesdifferentdepths(djrepresentingthedifferentvaluesofbathymetryobservedinthestudyareafromd1=90mtpdm =40=920m)
WeuseaBayesiansmoothingsplineFahrmeirandLang(2001)tomodelnon‐lineareffectsofdepthusingasecond‐orderrandomwalk(RW2)latentmodel
Asnopriorinformationontheparametersofthemodelwasavail‐ablewe used a vague zero‐meanGaussian prior distributionwith avarianceof100forthefixedeffectsRegardingthespatialeffectpri‐orsonμκandμτwereselectedsothatthemedianpriorrangeofthiscomponentwashalfof thestudyareaand itspriormedianstandarddeviationwas1Itwasonlyinthecaseofthebathymetrythatavisualpre‐selectionofpriorswasmadetoavoidoverfittingbychangingtheprioroftheprecisionparameterwhilethemodelswerescaledtohaveageneralizedvarianceequalto1(SoslashrbyeampRue2014)Inanycaseallresultingposteriordistributionsconcentratedwellwithinthesupportofthepriorsselected
Notethateachpredictorhasitsownintercept(0prime0)butbathy‐metricf(d)andspatialeffectsWsaresharedinbothpredictorsAlsoboththebathymetricandthespatialeffectsarescaledbyαdandαwrespectivelytoallowforthedifferencesinscalebetweenblueandredshrimpabundancesandtheLGCPintensities
Itisalsoworthnotingthatρisaparameteroftheentiremodelthat reflects the global variability of the response variable As al‐readymentionedbeforeinapreferentialsamplingsituationobser‐vationsaremorefrequentwhereabundancevaluesarelikelytobehigherandthisfactcouldaffectthe inferenceabouttheρparam‐eter In linewith this apossibleextension toour globalmodelingapproachcouldbeamodelthatcouldtakeintoaccountthistypeofvariabilitybylinkingρtotheabundancemeasurements
ModelcomparisonwasperformedusingtheDICandLCPOcri‐teriaFinallyinordertotestthepredictionperformanceofthefinalpreferentialandnon‐preferentialmodelswecalculatedthePearsoncorrelation (r) index between the predicted abundance estimatesand an external database of observed abundance values in thesametimeperiod (2009ndash2012)This independentdataset includes
(6)
YssimGa(s)
log(s)=0+ f(d)+Ws
WsimN(0Q())
(2 log log )simMN(ww)
Δdj=djminusdj+1simN(0d) j=1hellip m
dsimLogGamma(400001)
F I G U R E 2 emspStudyareaandsamplinglocations(hauls)ofblueandredshrimp(Aristeus antennatus)Thesizeofthedotsrepresentstheamountcaughtineachofthelocations
50
100
200
200
200
400 400
700
700
700
700
1000
1500
1500
2000
2000
200
0
200
0
658emsp |emsp emspensp PENNINO Et al
fishery‐independentdatacollectedduringtheMEDITS(EU‐fundedMEDIterranean Trawl Survey) The MEDITS is carried out fromspringtoearlysummer(ApriltoJune)everyyearintheareaanditusesarandomsampling(Penninoetal2016)
3emsp |emspRESULTS
31emsp|emspSimulated example
Resultsobtainedinthissimulationstudyshowthatnottakingintoconsiderationpreferentialsamplingleadstomisleadingresultsspe‐ciallyatlow‐abundanceareas
Figure3showsthedifferenceinDICLCPOandMAEscoresbetweenpreferentialandnon‐preferentialmodels forbothsam‐ples the ones distributed preferentially and the one distributedrandomlyInparticularmorethan75oftheDICLCPOandMAEvaluesobtainedinthe100simulatedGaussianfieldsusingprefer‐entialmodelswere lowerthantheonesobtainedwithnon‐pref‐erentialmodels
In addition as it can be appreciated in the example shown inFigure4whichpresentstheresultsofoneoftheonehundredsim‐ulationsforexplanatorypurposesevenifnoneofthemodelswas
able tomake optimal predictions at low abundance locations thenon‐preferentialmodelperformedsignificantlyworse
Similarly Figure5which shows the posterior predictivemeanofoneofthesimulatedabundanceprocesseswithoutandwiththepreferentialsamplingcorrection((a)and(b)respectively))illustratesthatalthoughbothmodelshavesimilarpredictivespatialpatternsthepreferentiallycorrectedmodelpredictsbetteratmoderate‐to‐lowabundanceareas
FinallyFigure6showshowalldifferentpriorsofthehyperpa‐rameters of the spatial field end up fitting posterior distributionsthatareallconcentratedwithintherealvaluesResultsclearlyindi‐catethatslightchangesinthepriorsarenotanissuewhendealingwiththepreferentialsamplingmodels
32emsp|emspDistribution of blue and red shrimp in the Western Mediterranean Sea
All possiblemodels derived from 6were run Among them themost relevant results are presented in Table1While analyzingthedataweobservedthatboththebathymetricandthespatialtermsof the LGCPaccounted for approximately the same infor‐mation As a consequence full models did not converge in the
F I G U R E 3 emsp Improvementofthepreferentialmodelagainstaconventionalmodelinmodelfitscores(DICLCPOandMAE)Comparisonisbasedon100preferentiallyandrandomlysampleddatasetsNotethatpositivevaluesrepresentanimprovementonmodelfitandviceversa
minus40
minus20
0
Preferential data Random data
Type of data
Impr
ovem
ent i
n ab
unda
nce
mod
elDIC
minus03
minus02
minus01
00
01
Preferential data Random data
Type of data
LCPO
minus25
minus20
minus15
minus10
minus05
00
Preferential data Random data
Type of data
MAE
emspensp emsp | emsp659PENNINO Et al
point‐patternprocesswhichrestrictedthemodelcomparison inTable1tocorrectingonlyoneoftheeffectseitherthebathymet‐ricorthespatialeffect
Thebestmodel (basedon theDICandLCPO)was theprefer‐ential one with an shared spatial effect (ieModel 5 in Table1)ThesecondmostrelevantmodelintermisDICandLCPOwasthe
F I G U R E 4 emspSimulatedabundanceagainstpredictedabundanceinthenon‐preferentialmodel(left)andinthemodelwiththepreferentialcorrection(right)foroneoftheonehundredsimulationsperformedThenon‐preferentialmodelpredictsworsethanthepreferentialmodelatlow‐abundanceareas
0 1 2 3 4 5
02
46
810
Simulated field
Non
minuspr
efer
entia
l pre
dict
ion
0 1 2 3 4 50
24
68
10
Simulated field
Pre
fere
ntia
l pre
dict
ion
F I G U R E 5 emspPosteriorpredictivemeanmapsofoneoftheonehundredsimulatedabundanceprocesseswithout(left)andwith(right)thepreferentialsamplingcorrection
20
25
30
35
40
45
50
55
20
25
30
35
40
45
50
55
60
(a) (b)
660emsp |emsp emspensp PENNINO Et al
preferential one that included in addition to the spatial effect asharedbathymetriceffect(ieModel8inTable1)
Figure7showsthemeanoftheposteriordistributionofthespa‐tialeffectinthemodelwithoutandwithpreferentialsamplingwhileFigure8illustratestheposteriorpredictivemeanoftheblueandredshrimp distribution without and with the preferential correctionBothfiguresshowasimilarpatternIndeeditisclearthatthespatialoutputsobtainedwiththepreferentialmodelbetterabsorbthevari‐abilityofthespecieshabitatprovidingamorenaturalpatternoftheblueandredshrimpdistribution
Finally formodel validation the final preferentialmodel ob‐tained a reasonably high value for Pearsons r (r=047) in thecross‐validationwiththeMEDITSdatasetwithrespecttothenon‐preferential one (r=022) It isworthmentioning thatMEDITSsareperformedonlyinspringsummerandfewsamplingsarecar‐riedoutinthestudyareaThesefindingsfurtherhighlighthowthe
correctionofthepreferentialmodelisimportanttoreflecttherealdistributionofaspecies
4emsp |emspDISCUSSION
Inthispaperwepresentedamodelingapproachthatcouldbeveryusefulformodelingthedistributionofspeciesusingopportunisticdataandacquiringin‐depthknowledgethatcouldbeessentialforthecorrectmanagementofnaturalresourcesSpatialecologyhasadirectappliedrelevancetonaturalresourcemanagementbutitalsohasabroadecologicalsignificanceAlthoughitmaybecompli‐catedtodefinetheboundariesofspecieshabitatscombinedwithanefficientmanagementthatrecognizestheimportanceofsuchareasthisrepresentsthefirststeptowardfacilitatinganeffectivespatialmanagement However as shown by our results using anon‐accurate approach could culminate inmisidentification of aspecieshabitatanduncertainpredictionsandsoininappropriatemanagementmeasureswhichcansometimesbeirreversible
Theresultsofthepracticalapplicationonblueandredshrimpincludedhereasarealworldscenarioshowthatpredictivemapssignificantly improve the prediction of the target specieswhenthemodelaccountsforpreferentialsamplingIndeeditisknownthatthesuitablebathymetricrangeoftheblueandredshrimpintheMediterraneanseaisbetween200and200meters(GuijarroMassutiacuteMorantaampDiacuteaz2008)thatisreflectedbythepreferen‐tialmodelpredictionmapIncontrastthenon‐preferentialmodelpredictionwasnotabletocapturetherealdistributionofthespe‐ciesInadditionifamanagementmeasuresuchascreationofamarineconservationareashouldbeappliedonthebasisof thenon‐preferentialmodel it isclearthat itwouldnotbeappropri‐ateThiscouldresultinextremelylargeareabeingrecommended
TA B L E 1 emspModelcomparisonfortheabundanceoftheblueandredshrimp(Aristeus antennatus)basedonDICLCPOandcomputationaltimes
Model DIC LCPO Times (s)
1 Intc+Depth +19 +005 3
2 Intc+Spatial +9 minus 24
3 Intc+Depth+Spatial +11 +001 57
4 Intc+Depth +52 +024 21
5 Intc+Spatial minus minus 171
6 Intc+Spatial+Depth +5 +002 212
7 Intc+Depth+Spatial +10 +001 2275
8 Intc+Depth+Spatial +3 +003 3470
NotesDICandLCPOscoresarepresentedasdeviationsfromthebestmodelIntcInterceptBoldtermssharedcomponents
F I G U R E 6 emspSensitivityanalysisofthepcpriordistributionsfortherangeandvarianceofasimulatedspatialfieldDashedlinesrepresentpriordistributionssolidlinesposteriordistributionsandverticallinestherealvaluesofeachofthehyperparametersofthespatialfieldrangeintheleftpanelandvarianceintherightpanelRangepriorsweresetsothattheprobabilityofhavingarangesmallerthan10203040and50ofthemaximumdistanceofthestudyareawas025Similarlypriorsoverthevarianceofthespatialfieldweresetsothattheprobabilityofhavingavariancehigherthan2345and6was01
05 10 15
00
05
10
15
20
25
Matern range
x
y
PosteriorPriorReal
PosteriorPriorReal
15 10 5
000
005
010
015
020
025
030
Matern variance
x
y
(a) (b)
emspensp emsp | emsp661PENNINO Et al
for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies
Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)
Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults
Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies
AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)
This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing
F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations
minus04
minus03
minus02
minus01
00
01
02
03
04
05
minus3
minus2
minus1
0
1
2
3
4
5
6
(a) (b)
F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations
25
30
35
40
45
minus4
minus3
minus2
minus1
0
1
2
3
4
5
(a) (b)
662emsp |emsp emspensp PENNINO Et al
parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence
Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges
ACKNOWLEDG MENTS
D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare
CONFLIC T OF INTERE S T
NoneDeclared
AUTHOR CONTRIBUTIONS
AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata
DATA ACCE SSIBILIT Y
Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit
R E FE R E N C E S
BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC
Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x
CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453
ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803
DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A
Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x
Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress
DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x
Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229
FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6
FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8
GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799
GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003
Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184
Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017
IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons
KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic
emspensp emsp | emsp663PENNINO Et al
observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x
LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609
Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x
LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64
Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281
PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062
ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods
RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x
RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609
Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x
RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223
SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25
Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064
SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004
SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353
Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2
VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska
WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079
How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789
emspensp emsp | emsp655PENNINO Et al
(a)and(b)arenotindependentPredictingthedistributionoftargetspeciesusingthistypeofdataimpliesthatthesamplingdistributionisnotuniformlyrandomasittendstohavemoreobservationswhereabundance ishigher and thusbasic statisticalmodelassumptionsareviolated
In order to overcome this problem Diggle etal (2010) pro‐posedto interpretthepreferentialsamplingprocessasamarked point processes modelThisapproachusestheinformationonsam‐plinglocationsandmodelsthemalongwiththemeasuredvaluesofthevariableofinterestinajointmodelInparticularsamplinglocationsareinterpretedasaspatial point patternaccountingfora higher point intensitywheremeasured values are higher Themeasurementstakenineachofthepoints(themarkinpointpro‐cessterminology)aremodeledalongwithandassumedtobepo‐tentiallydependentonthepoint‐patternintensityinajointmodelThisapproachaccountsforpreferentialsamplingwhilestillmak‐ingpredictionsforthevariableof interest inspace Inparticularinthefisheryexamplediscussedherefishinglocationsareinter‐pretedasapointpatternwhilethespeciescatchateachlocationisinterpretedasamark
21emsp|emspStatistical model
Formallyaspatialpointpatternconsistsofthespatial locationsofeventsorobjectsinadefinedstudyregionIllianPenttinenStoyanandStoyan (2008)Examples include locationsofspecies inapar‐ticularareaorparasitesinamicrobiologycultureSpatialpointpro‐cessesaremathematicalmodels(randomvariables)usedtodescribeandanalyze these spatialpatternsA simple theoreticalmodel foraspatialpointpattern is thePoissonprocessusuallydescribed intermsofitsintensityfunctionΛxThisintensityfunctionrepresentsthedistributionoflocations(ldquopointsrdquo)inspaceInaPoissonprocessthenumberofpointsfollowsaPoissondistributionandthelocationsofthesepointsindependentofanyoftheotherpointsThehomoge‐neousPoissonprocessrepresentscompletespatialrandomnessandservesasareferenceornullmodelinmanyapplications
Theintensityofapointprocessthatisthenumberofpointsperunitareamayeitherbeconstantoverspaceresultinginahomoge‐neousorstationarypatternorvaryinspacewithaspatialtrendre‐sultinginanon‐homogeneouspatternHowevertheassumptionofstationarityisgenerallyunrealisticinmostSDMapplicationsastheintensityfunctionvarieswiththeenvironmentmakingnon‐homo‐geneousPoissonprocessespotentiallyabetterchoice todescribespeciesdistributionbasedonatrendfunctionthatmaydependoncovariatesNonethelessinapplicationscovariatesmaynotexplaintheentirespatialstructureinaspatialpatternIncontrasttheclassofCoxprocessesprovidestheflexibilitytomodelaggregatedpointpatterns relative to observed and unobserved abiotic and bioticmechanismsHere spatial structures inanobservedpointpatternmayreflectdependenceonknownandmeasuredcovariatesaswellas on unknown or unmeasurable covariates or bioticmechanismsthat cannotbe readily representedby a covariate suchasdisper‐sallimitationIndeedthespatialstructureofbothabioticandbiotic
variablescan impactonecologicalprocessesandconsequentlybereflectedinthespeciesdistribution
Log‐GaussianCoxprocesses(LGCPs)areaspecificclassofCoxprocessmodelsinwhichthelogarithmoftheintensitysurfaceisaGaussianrandomfieldGiventherandomfieldMoreformally
where Vs is aGaussian random fieldGiven the random field theobservedlocationss = (s1hellipsn)areindependentandformaPoissonprocess
Inthecaseofpreferentiallysampledspeciestheobservedabun‐danceY = (y1hellipyn)isalsolinkedtotheintensityoftheunderlyingspatial field Inpracticeonlyvery fewsamplesmightbeavailableinsomeareasif it isassumedthatabundanceisparticularlylowinthese areas The LGCPmodel fitted to the sampling locations re‐flectsareaswithlowspeciesabundancethathaveresultedinareaswith fewer sampling locationsTo incorporate such information inthe SDM abundance model we apply joint modeling techniqueswhichallowfittingsharedmodelcomponentsinmodelswithtwoormorelinearpredictorsHereweconsidertwodependentpredictorswithtworesponsesthatistheobservedspeciesabundances(themarks)andtheintensityofthepointprocessreflectingsamplingin‐tensitythroughspace
This results in apreferential samplingmodel that consistsoftwo levelswhere information is sharedbetween the two levelsthemarkmodelandthepatternmodel Inparticular themarkY isassumedtofollowanexponentialfamilydistributionsuchasaGaussian lognormal or gamma distribution for continuous vari‐ablesoraPoissondistribution forcountdata Inall thesecasesthemeanμs isrelatedtothespatialtermthroughanappropriatelinkfunctionη
where prime0 istheinterceptofthemodelthecoefficientsprime
nquantify
theeffectofsomecovariatesXnontheresponseandWsisthespa‐tialeffectofthemodelthatistheGaussianrandomfieldTheco‐variatesintheadditivepredictorareenvironmentalfeatureslinkedtohabitatpreferencesofaspecies
InthesecondpartofthemodelanLGCPmodelwithintensityfunctionΛsreflectingthesamplinglocations
where β0istheinterceptoftheLGCPthecoefficientsβnquantifytheeffectofsomecovariatesXnontheintensityfunctionandWsisthespatialtermsharedwiththeLGCPbutscaledbyαtoallowforthedif‐ferencesinscalebetweenthemarkvaluesandtheLGCPintensities
Bayesianinferenceturnsouttobeagoodoptiontofitspatialhierarchicalmodelsbecauseitallowsboththeobserveddataandmodel parameters to be random variables ([Banerjee Carlin ampGelfand2004)resultinginamorerealisticandaccurateestima‐tionofuncertaintyAnimportantissueoftheBayesianapproachisthatpriordistributionsmustbeassignedtotheparameters(inourcaseβ0and
prime
0)andhyperparameters(inourcasethoseofthe
(1)log(Λs)=Vs
(2)(s)=0+
nXn+Ws
(3)Λs=exp
0+nXn+Ws
656emsp |emsp emspensp PENNINO Et al
spatialeffectW )involvedinthemodelsNeverthelessasusualinthiskindofmodels theresultingposteriordistributionsarenotanalyticallyknownandsonumericalapproachesareneeded
22emsp|emspFitting models with INLA
Model‐fitting methods based on Markov chain Monte Carlo(MCMC)canbevery time‐consuming for spatialmodels inpar‐ticularLGCPsNeverthelessLGCPsareaspecialcaseofthemoregeneralclassof latentGaussianmodelswhichcanbedescribedas a subclass of structured additive regression (STAR) models(Fahrmeir amp Tutz 2001) In these models the mean of the re‐sponsevariableis linkedtoastructuredpredictorwhichcanbeexpressedintermsoflinearandnon‐lineareffectsofcovariatesInaBayesianframeworkbyassigningGaussianpriorstoallran‐domtermsinthepredictorweobtainalatentGaussianmodelAsaresultwecandirectlycomputeLGCPmodelsusingIntegratednested Laplace approximation (INLA) INLAprovides a fast yetaccurate approach to fitting latentGaussianmodels andmakesthe inclusion of covariates andmarked point processesmathe‐maticallytractablewithcomputationallyefficientinference(Illianetal2013SimpsonIllianLindgrenSoslashrbyeampRue2016)
23emsp|emspPreferential and non‐preferential models
Asmentionedabovetheresultingpreferentialmodelcanbeex‐pressedasatwo‐partmodelasfollowsAssumingthattheob‐servedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp
0+Ws
wehavetoassignadistributionfortheabundanceBasedonthefactthattheabundanceisusuallyapositiveoutcomewehave considereda gammadistributionalthough clearly other options could be possible (exponentiallognormalorPoissonforcounts)Thisyields
wheretheGaussianrandomfieldWslinkstheLGCPandtheabundanceprocessscaledbyα inthePoissonprocesspredictortoallowfordif‐ferencesinscaleThematrixQ() isestimatedimplicitlythroughanapproximation of the Gaussian random field through the stochasticpartialdifferentialequation(SPDE)approachasinLindgrenRueandLindstroumlm(2011)andSimpsonetal(2016)
Itisworthnotingthatnottakingaccountofpreferentialsamplingleads to biased results This can be easily seen by comparing thepreferentialsamplingapproachwiththefollowingsimplermodel
Notethatthemodel inEquation5assumesthat for therandomfieldswehaveZneWwhereasthepreferentialsamplingmodelassumesasinglesharedrandomfieldforboththepointpatternandthemarkForbothmodelspriordistributionshavetobechosenfortoallthepa‐rametersandhyperparametersWehaveassignedvaguepriorsthatisusedthedefaultinR‐INLAduetoagenerallackofpriorinformationAnotherapproachthatcouldbeusedhereispenalizedcomplexitypri‐ors (hereafterpcpriors) asdescribed inFuglstadSimpson LindgrenandRue(2017)andreadilyavailableinR‐INLA
Asmentionedabovebothmodelsin4and5mayincludecovari‐atesthesignificanceofwhichmaybetestedthroughmodelselectionproceduresThereisverylittleliteratureavailableonmodelselectionforpointprocessmodelshowevercriterialikethedevianceinforma‐tioncriterion(DIC)(SpiegelhalterBestCarlinmampVanDerLinde2002)aresometimesused
24emsp|emspSimulated example
In order to illustrate the effectiveness of the preferential samplingmethodandtoemphasizethemisleadingresultswewouldobtainifwedonottakeitintoconsiderationweinitiallyconsiderasimulationstudy(Figure1)OnehundredrealizationsofaGaussianspatialrandomfieldwithMateacuterncovariance functionweregeneratedovera100‐by‐100grid using theRandomFields package (SchlatherMalinowskiMenckOestingampStrokorb2015)
Foreachofthe100simulatedGaussianspatialrandomfieldsthatrepresentthedistributionofaspecies in thestudyarea twosetsof100sampleswerereproducedonedistributedpreferentiallyandonedistributed randomly In particular preferentially locations were se‐lectedusingrelativeprobabilitiesproportionaltotheintensityfunctionΛsAbundanceestimateswereextractedattheselectedlocationsasasimulationoftheobservationsforthisexperiment
Both preferential and non‐preferential models were fitted toeachof thesamplesetsandcompared in theirperformanceusing
(4)
YssimGa (s)
log (s)=0+Ws
WsimN (0Q ())
(2 log log )simMN(ww)
(5)
YssimGa(s)
log(s)=0+Zs
ZsimN(0Q())
(2 log log )simMN(zz)
F I G U R E 1 emspRepresentationofoneoftheonehundredGaussianfieldsimulatedandtherespectivepreferentiallysamplinglocationsgenerated
08
08
06
06
04
04
02
02
000000
000002
000004
000006
000008
000010
000012
000014
000016
emspensp emsp | emsp657PENNINO Et al
three different criteria the deviance information criterion (DIC)(Spiegelhalteretal2002)the log‐conditionalpredictiveordinates(LCPO)(RoosampHeld2011)andthepredictivemeanabsoluteerror(MAE)(WillmottampMatsuura2005)SpecificallytheDICmeasuresthecompromisebetween the fit and theparsimonyof themodeltheLCPOisaldquoleaveoneoutrdquocross‐validationindextoassessthepre‐dictivepowerof themodelandtheMAE indicates thepredictionerror Lower valuesofDIC LCPOandMAE suggestbettermodelperformance Finally a sensitivity analysiswithdifferent pcpriorswas carriedout to assess their influenceon the final inferenceoftherangeandvarianceofasimulatedGaussianspatialrandomfield
25emsp|emspSpatial distribution of blue and red shrimp in the Western Mediterranean Sea
In this section we illustrate how preferential sampling can beaccounted for in theconcretedataexample fromthecontextoffisherydataFishery‐dependentdataderivedfromopportunisticsamplingonboats fromthecommercial fleetpresentastandardexample of preferential sampling since clearly fishers preferen‐tiallyfishinareaswheretheyexpecttofindlargeamountsoftheirtargetspecies
Weconsiderdataonblueand redshrimp (Aristeus antennatus Risso 1816)(Carbonelletal2017DevalampKapiris2016Lleonart2005)oneofthemosteconomicallyimportantdeep‐seatrawlfish‐eryintheWesternMediterraneanSeaThedatawerecollectedbyobserversonboardanumberoffishingboatsintheGulfofAlicante(Spain)from2009to2012Thedatasetincludes77haulsfromninedifferenttrawlingvessels(Figure2)andwasprovidedbytheInstituto Espantildeol de Oceanografiacutea(IEOSpanishOceanographicInstitute)
Asmentionedthefittedeffectsfortheabundanceofblueandredshrimp(Y)arecorrectedbyjointlyfittingamodelfortheabun‐dancesandthesampling(fishing)locationsreflectingfishersrsquo(poten‐tially)impreciseknowledgeofthedistributionofblueandredshrimpWealsoassumethattheobservedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp
0+df(d)+wWs
wheref(d) represents thenotnecessarily linear relationshipwithoneco‐variate inthiscasebathymetryThereasonunderneaththisselec‐tionwasthatexploratoryanalysisrevealednon‐linearrelationshipsbetweendepthandblueandredshrimpabundanceTheremainingsecondpartofthemodelthatistheoneexplainingtheabundancealsocontainstherelationshipwiththebathymetry
where s indexes the locationofeachhauland j indexesdifferentdepths(djrepresentingthedifferentvaluesofbathymetryobservedinthestudyareafromd1=90mtpdm =40=920m)
WeuseaBayesiansmoothingsplineFahrmeirandLang(2001)tomodelnon‐lineareffectsofdepthusingasecond‐orderrandomwalk(RW2)latentmodel
Asnopriorinformationontheparametersofthemodelwasavail‐ablewe used a vague zero‐meanGaussian prior distributionwith avarianceof100forthefixedeffectsRegardingthespatialeffectpri‐orsonμκandμτwereselectedsothatthemedianpriorrangeofthiscomponentwashalfof thestudyareaand itspriormedianstandarddeviationwas1Itwasonlyinthecaseofthebathymetrythatavisualpre‐selectionofpriorswasmadetoavoidoverfittingbychangingtheprioroftheprecisionparameterwhilethemodelswerescaledtohaveageneralizedvarianceequalto1(SoslashrbyeampRue2014)Inanycaseallresultingposteriordistributionsconcentratedwellwithinthesupportofthepriorsselected
Notethateachpredictorhasitsownintercept(0prime0)butbathy‐metricf(d)andspatialeffectsWsaresharedinbothpredictorsAlsoboththebathymetricandthespatialeffectsarescaledbyαdandαwrespectivelytoallowforthedifferencesinscalebetweenblueandredshrimpabundancesandtheLGCPintensities
Itisalsoworthnotingthatρisaparameteroftheentiremodelthat reflects the global variability of the response variable As al‐readymentionedbeforeinapreferentialsamplingsituationobser‐vationsaremorefrequentwhereabundancevaluesarelikelytobehigherandthisfactcouldaffectthe inferenceabouttheρparam‐eter In linewith this apossibleextension toour globalmodelingapproachcouldbeamodelthatcouldtakeintoaccountthistypeofvariabilitybylinkingρtotheabundancemeasurements
ModelcomparisonwasperformedusingtheDICandLCPOcri‐teriaFinallyinordertotestthepredictionperformanceofthefinalpreferentialandnon‐preferentialmodelswecalculatedthePearsoncorrelation (r) index between the predicted abundance estimatesand an external database of observed abundance values in thesametimeperiod (2009ndash2012)This independentdataset includes
(6)
YssimGa(s)
log(s)=0+ f(d)+Ws
WsimN(0Q())
(2 log log )simMN(ww)
Δdj=djminusdj+1simN(0d) j=1hellip m
dsimLogGamma(400001)
F I G U R E 2 emspStudyareaandsamplinglocations(hauls)ofblueandredshrimp(Aristeus antennatus)Thesizeofthedotsrepresentstheamountcaughtineachofthelocations
50
100
200
200
200
400 400
700
700
700
700
1000
1500
1500
2000
2000
200
0
200
0
658emsp |emsp emspensp PENNINO Et al
fishery‐independentdatacollectedduringtheMEDITS(EU‐fundedMEDIterranean Trawl Survey) The MEDITS is carried out fromspringtoearlysummer(ApriltoJune)everyyearintheareaanditusesarandomsampling(Penninoetal2016)
3emsp |emspRESULTS
31emsp|emspSimulated example
Resultsobtainedinthissimulationstudyshowthatnottakingintoconsiderationpreferentialsamplingleadstomisleadingresultsspe‐ciallyatlow‐abundanceareas
Figure3showsthedifferenceinDICLCPOandMAEscoresbetweenpreferentialandnon‐preferentialmodels forbothsam‐ples the ones distributed preferentially and the one distributedrandomlyInparticularmorethan75oftheDICLCPOandMAEvaluesobtainedinthe100simulatedGaussianfieldsusingprefer‐entialmodelswere lowerthantheonesobtainedwithnon‐pref‐erentialmodels
In addition as it can be appreciated in the example shown inFigure4whichpresentstheresultsofoneoftheonehundredsim‐ulationsforexplanatorypurposesevenifnoneofthemodelswas
able tomake optimal predictions at low abundance locations thenon‐preferentialmodelperformedsignificantlyworse
Similarly Figure5which shows the posterior predictivemeanofoneofthesimulatedabundanceprocesseswithoutandwiththepreferentialsamplingcorrection((a)and(b)respectively))illustratesthatalthoughbothmodelshavesimilarpredictivespatialpatternsthepreferentiallycorrectedmodelpredictsbetteratmoderate‐to‐lowabundanceareas
FinallyFigure6showshowalldifferentpriorsofthehyperpa‐rameters of the spatial field end up fitting posterior distributionsthatareallconcentratedwithintherealvaluesResultsclearlyindi‐catethatslightchangesinthepriorsarenotanissuewhendealingwiththepreferentialsamplingmodels
32emsp|emspDistribution of blue and red shrimp in the Western Mediterranean Sea
All possiblemodels derived from 6were run Among them themost relevant results are presented in Table1While analyzingthedataweobservedthatboththebathymetricandthespatialtermsof the LGCPaccounted for approximately the same infor‐mation As a consequence full models did not converge in the
F I G U R E 3 emsp Improvementofthepreferentialmodelagainstaconventionalmodelinmodelfitscores(DICLCPOandMAE)Comparisonisbasedon100preferentiallyandrandomlysampleddatasetsNotethatpositivevaluesrepresentanimprovementonmodelfitandviceversa
minus40
minus20
0
Preferential data Random data
Type of data
Impr
ovem
ent i
n ab
unda
nce
mod
elDIC
minus03
minus02
minus01
00
01
Preferential data Random data
Type of data
LCPO
minus25
minus20
minus15
minus10
minus05
00
Preferential data Random data
Type of data
MAE
emspensp emsp | emsp659PENNINO Et al
point‐patternprocesswhichrestrictedthemodelcomparison inTable1tocorrectingonlyoneoftheeffectseitherthebathymet‐ricorthespatialeffect
Thebestmodel (basedon theDICandLCPO)was theprefer‐ential one with an shared spatial effect (ieModel 5 in Table1)ThesecondmostrelevantmodelintermisDICandLCPOwasthe
F I G U R E 4 emspSimulatedabundanceagainstpredictedabundanceinthenon‐preferentialmodel(left)andinthemodelwiththepreferentialcorrection(right)foroneoftheonehundredsimulationsperformedThenon‐preferentialmodelpredictsworsethanthepreferentialmodelatlow‐abundanceareas
0 1 2 3 4 5
02
46
810
Simulated field
Non
minuspr
efer
entia
l pre
dict
ion
0 1 2 3 4 50
24
68
10
Simulated field
Pre
fere
ntia
l pre
dict
ion
F I G U R E 5 emspPosteriorpredictivemeanmapsofoneoftheonehundredsimulatedabundanceprocesseswithout(left)andwith(right)thepreferentialsamplingcorrection
20
25
30
35
40
45
50
55
20
25
30
35
40
45
50
55
60
(a) (b)
660emsp |emsp emspensp PENNINO Et al
preferential one that included in addition to the spatial effect asharedbathymetriceffect(ieModel8inTable1)
Figure7showsthemeanoftheposteriordistributionofthespa‐tialeffectinthemodelwithoutandwithpreferentialsamplingwhileFigure8illustratestheposteriorpredictivemeanoftheblueandredshrimp distribution without and with the preferential correctionBothfiguresshowasimilarpatternIndeeditisclearthatthespatialoutputsobtainedwiththepreferentialmodelbetterabsorbthevari‐abilityofthespecieshabitatprovidingamorenaturalpatternoftheblueandredshrimpdistribution
Finally formodel validation the final preferentialmodel ob‐tained a reasonably high value for Pearsons r (r=047) in thecross‐validationwiththeMEDITSdatasetwithrespecttothenon‐preferential one (r=022) It isworthmentioning thatMEDITSsareperformedonlyinspringsummerandfewsamplingsarecar‐riedoutinthestudyareaThesefindingsfurtherhighlighthowthe
correctionofthepreferentialmodelisimportanttoreflecttherealdistributionofaspecies
4emsp |emspDISCUSSION
Inthispaperwepresentedamodelingapproachthatcouldbeveryusefulformodelingthedistributionofspeciesusingopportunisticdataandacquiringin‐depthknowledgethatcouldbeessentialforthecorrectmanagementofnaturalresourcesSpatialecologyhasadirectappliedrelevancetonaturalresourcemanagementbutitalsohasabroadecologicalsignificanceAlthoughitmaybecompli‐catedtodefinetheboundariesofspecieshabitatscombinedwithanefficientmanagementthatrecognizestheimportanceofsuchareasthisrepresentsthefirststeptowardfacilitatinganeffectivespatialmanagement However as shown by our results using anon‐accurate approach could culminate inmisidentification of aspecieshabitatanduncertainpredictionsandsoininappropriatemanagementmeasureswhichcansometimesbeirreversible
Theresultsofthepracticalapplicationonblueandredshrimpincludedhereasarealworldscenarioshowthatpredictivemapssignificantly improve the prediction of the target specieswhenthemodelaccountsforpreferentialsamplingIndeeditisknownthatthesuitablebathymetricrangeoftheblueandredshrimpintheMediterraneanseaisbetween200and200meters(GuijarroMassutiacuteMorantaampDiacuteaz2008)thatisreflectedbythepreferen‐tialmodelpredictionmapIncontrastthenon‐preferentialmodelpredictionwasnotabletocapturetherealdistributionofthespe‐ciesInadditionifamanagementmeasuresuchascreationofamarineconservationareashouldbeappliedonthebasisof thenon‐preferentialmodel it isclearthat itwouldnotbeappropri‐ateThiscouldresultinextremelylargeareabeingrecommended
TA B L E 1 emspModelcomparisonfortheabundanceoftheblueandredshrimp(Aristeus antennatus)basedonDICLCPOandcomputationaltimes
Model DIC LCPO Times (s)
1 Intc+Depth +19 +005 3
2 Intc+Spatial +9 minus 24
3 Intc+Depth+Spatial +11 +001 57
4 Intc+Depth +52 +024 21
5 Intc+Spatial minus minus 171
6 Intc+Spatial+Depth +5 +002 212
7 Intc+Depth+Spatial +10 +001 2275
8 Intc+Depth+Spatial +3 +003 3470
NotesDICandLCPOscoresarepresentedasdeviationsfromthebestmodelIntcInterceptBoldtermssharedcomponents
F I G U R E 6 emspSensitivityanalysisofthepcpriordistributionsfortherangeandvarianceofasimulatedspatialfieldDashedlinesrepresentpriordistributionssolidlinesposteriordistributionsandverticallinestherealvaluesofeachofthehyperparametersofthespatialfieldrangeintheleftpanelandvarianceintherightpanelRangepriorsweresetsothattheprobabilityofhavingarangesmallerthan10203040and50ofthemaximumdistanceofthestudyareawas025Similarlypriorsoverthevarianceofthespatialfieldweresetsothattheprobabilityofhavingavariancehigherthan2345and6was01
05 10 15
00
05
10
15
20
25
Matern range
x
y
PosteriorPriorReal
PosteriorPriorReal
15 10 5
000
005
010
015
020
025
030
Matern variance
x
y
(a) (b)
emspensp emsp | emsp661PENNINO Et al
for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies
Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)
Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults
Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies
AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)
This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing
F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations
minus04
minus03
minus02
minus01
00
01
02
03
04
05
minus3
minus2
minus1
0
1
2
3
4
5
6
(a) (b)
F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations
25
30
35
40
45
minus4
minus3
minus2
minus1
0
1
2
3
4
5
(a) (b)
662emsp |emsp emspensp PENNINO Et al
parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence
Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges
ACKNOWLEDG MENTS
D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare
CONFLIC T OF INTERE S T
NoneDeclared
AUTHOR CONTRIBUTIONS
AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata
DATA ACCE SSIBILIT Y
Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit
R E FE R E N C E S
BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC
Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x
CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453
ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803
DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A
Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x
Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress
DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x
Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229
FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6
FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8
GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799
GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003
Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184
Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017
IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons
KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic
emspensp emsp | emsp663PENNINO Et al
observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x
LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609
Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x
LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64
Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281
PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062
ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods
RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x
RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609
Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x
RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223
SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25
Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064
SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004
SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353
Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2
VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska
WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079
How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789
656emsp |emsp emspensp PENNINO Et al
spatialeffectW )involvedinthemodelsNeverthelessasusualinthiskindofmodels theresultingposteriordistributionsarenotanalyticallyknownandsonumericalapproachesareneeded
22emsp|emspFitting models with INLA
Model‐fitting methods based on Markov chain Monte Carlo(MCMC)canbevery time‐consuming for spatialmodels inpar‐ticularLGCPsNeverthelessLGCPsareaspecialcaseofthemoregeneralclassof latentGaussianmodelswhichcanbedescribedas a subclass of structured additive regression (STAR) models(Fahrmeir amp Tutz 2001) In these models the mean of the re‐sponsevariableis linkedtoastructuredpredictorwhichcanbeexpressedintermsoflinearandnon‐lineareffectsofcovariatesInaBayesianframeworkbyassigningGaussianpriorstoallran‐domtermsinthepredictorweobtainalatentGaussianmodelAsaresultwecandirectlycomputeLGCPmodelsusingIntegratednested Laplace approximation (INLA) INLAprovides a fast yetaccurate approach to fitting latentGaussianmodels andmakesthe inclusion of covariates andmarked point processesmathe‐maticallytractablewithcomputationallyefficientinference(Illianetal2013SimpsonIllianLindgrenSoslashrbyeampRue2016)
23emsp|emspPreferential and non‐preferential models
Asmentionedabovetheresultingpreferentialmodelcanbeex‐pressedasatwo‐partmodelasfollowsAssumingthattheob‐servedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp
0+Ws
wehavetoassignadistributionfortheabundanceBasedonthefactthattheabundanceisusuallyapositiveoutcomewehave considereda gammadistributionalthough clearly other options could be possible (exponentiallognormalorPoissonforcounts)Thisyields
wheretheGaussianrandomfieldWslinkstheLGCPandtheabundanceprocessscaledbyα inthePoissonprocesspredictortoallowfordif‐ferencesinscaleThematrixQ() isestimatedimplicitlythroughanapproximation of the Gaussian random field through the stochasticpartialdifferentialequation(SPDE)approachasinLindgrenRueandLindstroumlm(2011)andSimpsonetal(2016)
Itisworthnotingthatnottakingaccountofpreferentialsamplingleads to biased results This can be easily seen by comparing thepreferentialsamplingapproachwiththefollowingsimplermodel
Notethatthemodel inEquation5assumesthat for therandomfieldswehaveZneWwhereasthepreferentialsamplingmodelassumesasinglesharedrandomfieldforboththepointpatternandthemarkForbothmodelspriordistributionshavetobechosenfortoallthepa‐rametersandhyperparametersWehaveassignedvaguepriorsthatisusedthedefaultinR‐INLAduetoagenerallackofpriorinformationAnotherapproachthatcouldbeusedhereispenalizedcomplexitypri‐ors (hereafterpcpriors) asdescribed inFuglstadSimpson LindgrenandRue(2017)andreadilyavailableinR‐INLA
Asmentionedabovebothmodelsin4and5mayincludecovari‐atesthesignificanceofwhichmaybetestedthroughmodelselectionproceduresThereisverylittleliteratureavailableonmodelselectionforpointprocessmodelshowevercriterialikethedevianceinforma‐tioncriterion(DIC)(SpiegelhalterBestCarlinmampVanDerLinde2002)aresometimesused
24emsp|emspSimulated example
In order to illustrate the effectiveness of the preferential samplingmethodandtoemphasizethemisleadingresultswewouldobtainifwedonottakeitintoconsiderationweinitiallyconsiderasimulationstudy(Figure1)OnehundredrealizationsofaGaussianspatialrandomfieldwithMateacuterncovariance functionweregeneratedovera100‐by‐100grid using theRandomFields package (SchlatherMalinowskiMenckOestingampStrokorb2015)
Foreachofthe100simulatedGaussianspatialrandomfieldsthatrepresentthedistributionofaspecies in thestudyarea twosetsof100sampleswerereproducedonedistributedpreferentiallyandonedistributed randomly In particular preferentially locations were se‐lectedusingrelativeprobabilitiesproportionaltotheintensityfunctionΛsAbundanceestimateswereextractedattheselectedlocationsasasimulationoftheobservationsforthisexperiment
Both preferential and non‐preferential models were fitted toeachof thesamplesetsandcompared in theirperformanceusing
(4)
YssimGa (s)
log (s)=0+Ws
WsimN (0Q ())
(2 log log )simMN(ww)
(5)
YssimGa(s)
log(s)=0+Zs
ZsimN(0Q())
(2 log log )simMN(zz)
F I G U R E 1 emspRepresentationofoneoftheonehundredGaussianfieldsimulatedandtherespectivepreferentiallysamplinglocationsgenerated
08
08
06
06
04
04
02
02
000000
000002
000004
000006
000008
000010
000012
000014
000016
emspensp emsp | emsp657PENNINO Et al
three different criteria the deviance information criterion (DIC)(Spiegelhalteretal2002)the log‐conditionalpredictiveordinates(LCPO)(RoosampHeld2011)andthepredictivemeanabsoluteerror(MAE)(WillmottampMatsuura2005)SpecificallytheDICmeasuresthecompromisebetween the fit and theparsimonyof themodeltheLCPOisaldquoleaveoneoutrdquocross‐validationindextoassessthepre‐dictivepowerof themodelandtheMAE indicates thepredictionerror Lower valuesofDIC LCPOandMAE suggestbettermodelperformance Finally a sensitivity analysiswithdifferent pcpriorswas carriedout to assess their influenceon the final inferenceoftherangeandvarianceofasimulatedGaussianspatialrandomfield
25emsp|emspSpatial distribution of blue and red shrimp in the Western Mediterranean Sea
In this section we illustrate how preferential sampling can beaccounted for in theconcretedataexample fromthecontextoffisherydataFishery‐dependentdataderivedfromopportunisticsamplingonboats fromthecommercial fleetpresentastandardexample of preferential sampling since clearly fishers preferen‐tiallyfishinareaswheretheyexpecttofindlargeamountsoftheirtargetspecies
Weconsiderdataonblueand redshrimp (Aristeus antennatus Risso 1816)(Carbonelletal2017DevalampKapiris2016Lleonart2005)oneofthemosteconomicallyimportantdeep‐seatrawlfish‐eryintheWesternMediterraneanSeaThedatawerecollectedbyobserversonboardanumberoffishingboatsintheGulfofAlicante(Spain)from2009to2012Thedatasetincludes77haulsfromninedifferenttrawlingvessels(Figure2)andwasprovidedbytheInstituto Espantildeol de Oceanografiacutea(IEOSpanishOceanographicInstitute)
Asmentionedthefittedeffectsfortheabundanceofblueandredshrimp(Y)arecorrectedbyjointlyfittingamodelfortheabun‐dancesandthesampling(fishing)locationsreflectingfishersrsquo(poten‐tially)impreciseknowledgeofthedistributionofblueandredshrimpWealsoassumethattheobservedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp
0+df(d)+wWs
wheref(d) represents thenotnecessarily linear relationshipwithoneco‐variate inthiscasebathymetryThereasonunderneaththisselec‐tionwasthatexploratoryanalysisrevealednon‐linearrelationshipsbetweendepthandblueandredshrimpabundanceTheremainingsecondpartofthemodelthatistheoneexplainingtheabundancealsocontainstherelationshipwiththebathymetry
where s indexes the locationofeachhauland j indexesdifferentdepths(djrepresentingthedifferentvaluesofbathymetryobservedinthestudyareafromd1=90mtpdm =40=920m)
WeuseaBayesiansmoothingsplineFahrmeirandLang(2001)tomodelnon‐lineareffectsofdepthusingasecond‐orderrandomwalk(RW2)latentmodel
Asnopriorinformationontheparametersofthemodelwasavail‐ablewe used a vague zero‐meanGaussian prior distributionwith avarianceof100forthefixedeffectsRegardingthespatialeffectpri‐orsonμκandμτwereselectedsothatthemedianpriorrangeofthiscomponentwashalfof thestudyareaand itspriormedianstandarddeviationwas1Itwasonlyinthecaseofthebathymetrythatavisualpre‐selectionofpriorswasmadetoavoidoverfittingbychangingtheprioroftheprecisionparameterwhilethemodelswerescaledtohaveageneralizedvarianceequalto1(SoslashrbyeampRue2014)Inanycaseallresultingposteriordistributionsconcentratedwellwithinthesupportofthepriorsselected
Notethateachpredictorhasitsownintercept(0prime0)butbathy‐metricf(d)andspatialeffectsWsaresharedinbothpredictorsAlsoboththebathymetricandthespatialeffectsarescaledbyαdandαwrespectivelytoallowforthedifferencesinscalebetweenblueandredshrimpabundancesandtheLGCPintensities
Itisalsoworthnotingthatρisaparameteroftheentiremodelthat reflects the global variability of the response variable As al‐readymentionedbeforeinapreferentialsamplingsituationobser‐vationsaremorefrequentwhereabundancevaluesarelikelytobehigherandthisfactcouldaffectthe inferenceabouttheρparam‐eter In linewith this apossibleextension toour globalmodelingapproachcouldbeamodelthatcouldtakeintoaccountthistypeofvariabilitybylinkingρtotheabundancemeasurements
ModelcomparisonwasperformedusingtheDICandLCPOcri‐teriaFinallyinordertotestthepredictionperformanceofthefinalpreferentialandnon‐preferentialmodelswecalculatedthePearsoncorrelation (r) index between the predicted abundance estimatesand an external database of observed abundance values in thesametimeperiod (2009ndash2012)This independentdataset includes
(6)
YssimGa(s)
log(s)=0+ f(d)+Ws
WsimN(0Q())
(2 log log )simMN(ww)
Δdj=djminusdj+1simN(0d) j=1hellip m
dsimLogGamma(400001)
F I G U R E 2 emspStudyareaandsamplinglocations(hauls)ofblueandredshrimp(Aristeus antennatus)Thesizeofthedotsrepresentstheamountcaughtineachofthelocations
50
100
200
200
200
400 400
700
700
700
700
1000
1500
1500
2000
2000
200
0
200
0
658emsp |emsp emspensp PENNINO Et al
fishery‐independentdatacollectedduringtheMEDITS(EU‐fundedMEDIterranean Trawl Survey) The MEDITS is carried out fromspringtoearlysummer(ApriltoJune)everyyearintheareaanditusesarandomsampling(Penninoetal2016)
3emsp |emspRESULTS
31emsp|emspSimulated example
Resultsobtainedinthissimulationstudyshowthatnottakingintoconsiderationpreferentialsamplingleadstomisleadingresultsspe‐ciallyatlow‐abundanceareas
Figure3showsthedifferenceinDICLCPOandMAEscoresbetweenpreferentialandnon‐preferentialmodels forbothsam‐ples the ones distributed preferentially and the one distributedrandomlyInparticularmorethan75oftheDICLCPOandMAEvaluesobtainedinthe100simulatedGaussianfieldsusingprefer‐entialmodelswere lowerthantheonesobtainedwithnon‐pref‐erentialmodels
In addition as it can be appreciated in the example shown inFigure4whichpresentstheresultsofoneoftheonehundredsim‐ulationsforexplanatorypurposesevenifnoneofthemodelswas
able tomake optimal predictions at low abundance locations thenon‐preferentialmodelperformedsignificantlyworse
Similarly Figure5which shows the posterior predictivemeanofoneofthesimulatedabundanceprocesseswithoutandwiththepreferentialsamplingcorrection((a)and(b)respectively))illustratesthatalthoughbothmodelshavesimilarpredictivespatialpatternsthepreferentiallycorrectedmodelpredictsbetteratmoderate‐to‐lowabundanceareas
FinallyFigure6showshowalldifferentpriorsofthehyperpa‐rameters of the spatial field end up fitting posterior distributionsthatareallconcentratedwithintherealvaluesResultsclearlyindi‐catethatslightchangesinthepriorsarenotanissuewhendealingwiththepreferentialsamplingmodels
32emsp|emspDistribution of blue and red shrimp in the Western Mediterranean Sea
All possiblemodels derived from 6were run Among them themost relevant results are presented in Table1While analyzingthedataweobservedthatboththebathymetricandthespatialtermsof the LGCPaccounted for approximately the same infor‐mation As a consequence full models did not converge in the
F I G U R E 3 emsp Improvementofthepreferentialmodelagainstaconventionalmodelinmodelfitscores(DICLCPOandMAE)Comparisonisbasedon100preferentiallyandrandomlysampleddatasetsNotethatpositivevaluesrepresentanimprovementonmodelfitandviceversa
minus40
minus20
0
Preferential data Random data
Type of data
Impr
ovem
ent i
n ab
unda
nce
mod
elDIC
minus03
minus02
minus01
00
01
Preferential data Random data
Type of data
LCPO
minus25
minus20
minus15
minus10
minus05
00
Preferential data Random data
Type of data
MAE
emspensp emsp | emsp659PENNINO Et al
point‐patternprocesswhichrestrictedthemodelcomparison inTable1tocorrectingonlyoneoftheeffectseitherthebathymet‐ricorthespatialeffect
Thebestmodel (basedon theDICandLCPO)was theprefer‐ential one with an shared spatial effect (ieModel 5 in Table1)ThesecondmostrelevantmodelintermisDICandLCPOwasthe
F I G U R E 4 emspSimulatedabundanceagainstpredictedabundanceinthenon‐preferentialmodel(left)andinthemodelwiththepreferentialcorrection(right)foroneoftheonehundredsimulationsperformedThenon‐preferentialmodelpredictsworsethanthepreferentialmodelatlow‐abundanceareas
0 1 2 3 4 5
02
46
810
Simulated field
Non
minuspr
efer
entia
l pre
dict
ion
0 1 2 3 4 50
24
68
10
Simulated field
Pre
fere
ntia
l pre
dict
ion
F I G U R E 5 emspPosteriorpredictivemeanmapsofoneoftheonehundredsimulatedabundanceprocesseswithout(left)andwith(right)thepreferentialsamplingcorrection
20
25
30
35
40
45
50
55
20
25
30
35
40
45
50
55
60
(a) (b)
660emsp |emsp emspensp PENNINO Et al
preferential one that included in addition to the spatial effect asharedbathymetriceffect(ieModel8inTable1)
Figure7showsthemeanoftheposteriordistributionofthespa‐tialeffectinthemodelwithoutandwithpreferentialsamplingwhileFigure8illustratestheposteriorpredictivemeanoftheblueandredshrimp distribution without and with the preferential correctionBothfiguresshowasimilarpatternIndeeditisclearthatthespatialoutputsobtainedwiththepreferentialmodelbetterabsorbthevari‐abilityofthespecieshabitatprovidingamorenaturalpatternoftheblueandredshrimpdistribution
Finally formodel validation the final preferentialmodel ob‐tained a reasonably high value for Pearsons r (r=047) in thecross‐validationwiththeMEDITSdatasetwithrespecttothenon‐preferential one (r=022) It isworthmentioning thatMEDITSsareperformedonlyinspringsummerandfewsamplingsarecar‐riedoutinthestudyareaThesefindingsfurtherhighlighthowthe
correctionofthepreferentialmodelisimportanttoreflecttherealdistributionofaspecies
4emsp |emspDISCUSSION
Inthispaperwepresentedamodelingapproachthatcouldbeveryusefulformodelingthedistributionofspeciesusingopportunisticdataandacquiringin‐depthknowledgethatcouldbeessentialforthecorrectmanagementofnaturalresourcesSpatialecologyhasadirectappliedrelevancetonaturalresourcemanagementbutitalsohasabroadecologicalsignificanceAlthoughitmaybecompli‐catedtodefinetheboundariesofspecieshabitatscombinedwithanefficientmanagementthatrecognizestheimportanceofsuchareasthisrepresentsthefirststeptowardfacilitatinganeffectivespatialmanagement However as shown by our results using anon‐accurate approach could culminate inmisidentification of aspecieshabitatanduncertainpredictionsandsoininappropriatemanagementmeasureswhichcansometimesbeirreversible
Theresultsofthepracticalapplicationonblueandredshrimpincludedhereasarealworldscenarioshowthatpredictivemapssignificantly improve the prediction of the target specieswhenthemodelaccountsforpreferentialsamplingIndeeditisknownthatthesuitablebathymetricrangeoftheblueandredshrimpintheMediterraneanseaisbetween200and200meters(GuijarroMassutiacuteMorantaampDiacuteaz2008)thatisreflectedbythepreferen‐tialmodelpredictionmapIncontrastthenon‐preferentialmodelpredictionwasnotabletocapturetherealdistributionofthespe‐ciesInadditionifamanagementmeasuresuchascreationofamarineconservationareashouldbeappliedonthebasisof thenon‐preferentialmodel it isclearthat itwouldnotbeappropri‐ateThiscouldresultinextremelylargeareabeingrecommended
TA B L E 1 emspModelcomparisonfortheabundanceoftheblueandredshrimp(Aristeus antennatus)basedonDICLCPOandcomputationaltimes
Model DIC LCPO Times (s)
1 Intc+Depth +19 +005 3
2 Intc+Spatial +9 minus 24
3 Intc+Depth+Spatial +11 +001 57
4 Intc+Depth +52 +024 21
5 Intc+Spatial minus minus 171
6 Intc+Spatial+Depth +5 +002 212
7 Intc+Depth+Spatial +10 +001 2275
8 Intc+Depth+Spatial +3 +003 3470
NotesDICandLCPOscoresarepresentedasdeviationsfromthebestmodelIntcInterceptBoldtermssharedcomponents
F I G U R E 6 emspSensitivityanalysisofthepcpriordistributionsfortherangeandvarianceofasimulatedspatialfieldDashedlinesrepresentpriordistributionssolidlinesposteriordistributionsandverticallinestherealvaluesofeachofthehyperparametersofthespatialfieldrangeintheleftpanelandvarianceintherightpanelRangepriorsweresetsothattheprobabilityofhavingarangesmallerthan10203040and50ofthemaximumdistanceofthestudyareawas025Similarlypriorsoverthevarianceofthespatialfieldweresetsothattheprobabilityofhavingavariancehigherthan2345and6was01
05 10 15
00
05
10
15
20
25
Matern range
x
y
PosteriorPriorReal
PosteriorPriorReal
15 10 5
000
005
010
015
020
025
030
Matern variance
x
y
(a) (b)
emspensp emsp | emsp661PENNINO Et al
for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies
Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)
Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults
Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies
AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)
This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing
F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations
minus04
minus03
minus02
minus01
00
01
02
03
04
05
minus3
minus2
minus1
0
1
2
3
4
5
6
(a) (b)
F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations
25
30
35
40
45
minus4
minus3
minus2
minus1
0
1
2
3
4
5
(a) (b)
662emsp |emsp emspensp PENNINO Et al
parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence
Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges
ACKNOWLEDG MENTS
D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare
CONFLIC T OF INTERE S T
NoneDeclared
AUTHOR CONTRIBUTIONS
AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata
DATA ACCE SSIBILIT Y
Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit
R E FE R E N C E S
BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC
Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x
CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453
ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803
DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A
Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x
Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress
DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x
Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229
FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6
FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8
GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799
GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003
Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184
Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017
IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons
KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic
emspensp emsp | emsp663PENNINO Et al
observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x
LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609
Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x
LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64
Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281
PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062
ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods
RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x
RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609
Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x
RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223
SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25
Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064
SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004
SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353
Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2
VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska
WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079
How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789
emspensp emsp | emsp657PENNINO Et al
three different criteria the deviance information criterion (DIC)(Spiegelhalteretal2002)the log‐conditionalpredictiveordinates(LCPO)(RoosampHeld2011)andthepredictivemeanabsoluteerror(MAE)(WillmottampMatsuura2005)SpecificallytheDICmeasuresthecompromisebetween the fit and theparsimonyof themodeltheLCPOisaldquoleaveoneoutrdquocross‐validationindextoassessthepre‐dictivepowerof themodelandtheMAE indicates thepredictionerror Lower valuesofDIC LCPOandMAE suggestbettermodelperformance Finally a sensitivity analysiswithdifferent pcpriorswas carriedout to assess their influenceon the final inferenceoftherangeandvarianceofasimulatedGaussianspatialrandomfield
25emsp|emspSpatial distribution of blue and red shrimp in the Western Mediterranean Sea
In this section we illustrate how preferential sampling can beaccounted for in theconcretedataexample fromthecontextoffisherydataFishery‐dependentdataderivedfromopportunisticsamplingonboats fromthecommercial fleetpresentastandardexample of preferential sampling since clearly fishers preferen‐tiallyfishinareaswheretheyexpecttofindlargeamountsoftheirtargetspecies
Weconsiderdataonblueand redshrimp (Aristeus antennatus Risso 1816)(Carbonelletal2017DevalampKapiris2016Lleonart2005)oneofthemosteconomicallyimportantdeep‐seatrawlfish‐eryintheWesternMediterraneanSeaThedatawerecollectedbyobserversonboardanumberoffishingboatsintheGulfofAlicante(Spain)from2009to2012Thedatasetincludes77haulsfromninedifferenttrawlingvessels(Figure2)andwasprovidedbytheInstituto Espantildeol de Oceanografiacutea(IEOSpanishOceanographicInstitute)
Asmentionedthefittedeffectsfortheabundanceofblueandredshrimp(Y)arecorrectedbyjointlyfittingamodelfortheabun‐dancesandthesampling(fishing)locationsreflectingfishersrsquo(poten‐tially)impreciseknowledgeofthedistributionofblueandredshrimpWealsoassumethattheobservedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp
0+df(d)+wWs
wheref(d) represents thenotnecessarily linear relationshipwithoneco‐variate inthiscasebathymetryThereasonunderneaththisselec‐tionwasthatexploratoryanalysisrevealednon‐linearrelationshipsbetweendepthandblueandredshrimpabundanceTheremainingsecondpartofthemodelthatistheoneexplainingtheabundancealsocontainstherelationshipwiththebathymetry
where s indexes the locationofeachhauland j indexesdifferentdepths(djrepresentingthedifferentvaluesofbathymetryobservedinthestudyareafromd1=90mtpdm =40=920m)
WeuseaBayesiansmoothingsplineFahrmeirandLang(2001)tomodelnon‐lineareffectsofdepthusingasecond‐orderrandomwalk(RW2)latentmodel
Asnopriorinformationontheparametersofthemodelwasavail‐ablewe used a vague zero‐meanGaussian prior distributionwith avarianceof100forthefixedeffectsRegardingthespatialeffectpri‐orsonμκandμτwereselectedsothatthemedianpriorrangeofthiscomponentwashalfof thestudyareaand itspriormedianstandarddeviationwas1Itwasonlyinthecaseofthebathymetrythatavisualpre‐selectionofpriorswasmadetoavoidoverfittingbychangingtheprioroftheprecisionparameterwhilethemodelswerescaledtohaveageneralizedvarianceequalto1(SoslashrbyeampRue2014)Inanycaseallresultingposteriordistributionsconcentratedwellwithinthesupportofthepriorsselected
Notethateachpredictorhasitsownintercept(0prime0)butbathy‐metricf(d)andspatialeffectsWsaresharedinbothpredictorsAlsoboththebathymetricandthespatialeffectsarescaledbyαdandαwrespectivelytoallowforthedifferencesinscalebetweenblueandredshrimpabundancesandtheLGCPintensities
Itisalsoworthnotingthatρisaparameteroftheentiremodelthat reflects the global variability of the response variable As al‐readymentionedbeforeinapreferentialsamplingsituationobser‐vationsaremorefrequentwhereabundancevaluesarelikelytobehigherandthisfactcouldaffectthe inferenceabouttheρparam‐eter In linewith this apossibleextension toour globalmodelingapproachcouldbeamodelthatcouldtakeintoaccountthistypeofvariabilitybylinkingρtotheabundancemeasurements
ModelcomparisonwasperformedusingtheDICandLCPOcri‐teriaFinallyinordertotestthepredictionperformanceofthefinalpreferentialandnon‐preferentialmodelswecalculatedthePearsoncorrelation (r) index between the predicted abundance estimatesand an external database of observed abundance values in thesametimeperiod (2009ndash2012)This independentdataset includes
(6)
YssimGa(s)
log(s)=0+ f(d)+Ws
WsimN(0Q())
(2 log log )simMN(ww)
Δdj=djminusdj+1simN(0d) j=1hellip m
dsimLogGamma(400001)
F I G U R E 2 emspStudyareaandsamplinglocations(hauls)ofblueandredshrimp(Aristeus antennatus)Thesizeofthedotsrepresentstheamountcaughtineachofthelocations
50
100
200
200
200
400 400
700
700
700
700
1000
1500
1500
2000
2000
200
0
200
0
658emsp |emsp emspensp PENNINO Et al
fishery‐independentdatacollectedduringtheMEDITS(EU‐fundedMEDIterranean Trawl Survey) The MEDITS is carried out fromspringtoearlysummer(ApriltoJune)everyyearintheareaanditusesarandomsampling(Penninoetal2016)
3emsp |emspRESULTS
31emsp|emspSimulated example
Resultsobtainedinthissimulationstudyshowthatnottakingintoconsiderationpreferentialsamplingleadstomisleadingresultsspe‐ciallyatlow‐abundanceareas
Figure3showsthedifferenceinDICLCPOandMAEscoresbetweenpreferentialandnon‐preferentialmodels forbothsam‐ples the ones distributed preferentially and the one distributedrandomlyInparticularmorethan75oftheDICLCPOandMAEvaluesobtainedinthe100simulatedGaussianfieldsusingprefer‐entialmodelswere lowerthantheonesobtainedwithnon‐pref‐erentialmodels
In addition as it can be appreciated in the example shown inFigure4whichpresentstheresultsofoneoftheonehundredsim‐ulationsforexplanatorypurposesevenifnoneofthemodelswas
able tomake optimal predictions at low abundance locations thenon‐preferentialmodelperformedsignificantlyworse
Similarly Figure5which shows the posterior predictivemeanofoneofthesimulatedabundanceprocesseswithoutandwiththepreferentialsamplingcorrection((a)and(b)respectively))illustratesthatalthoughbothmodelshavesimilarpredictivespatialpatternsthepreferentiallycorrectedmodelpredictsbetteratmoderate‐to‐lowabundanceareas
FinallyFigure6showshowalldifferentpriorsofthehyperpa‐rameters of the spatial field end up fitting posterior distributionsthatareallconcentratedwithintherealvaluesResultsclearlyindi‐catethatslightchangesinthepriorsarenotanissuewhendealingwiththepreferentialsamplingmodels
32emsp|emspDistribution of blue and red shrimp in the Western Mediterranean Sea
All possiblemodels derived from 6were run Among them themost relevant results are presented in Table1While analyzingthedataweobservedthatboththebathymetricandthespatialtermsof the LGCPaccounted for approximately the same infor‐mation As a consequence full models did not converge in the
F I G U R E 3 emsp Improvementofthepreferentialmodelagainstaconventionalmodelinmodelfitscores(DICLCPOandMAE)Comparisonisbasedon100preferentiallyandrandomlysampleddatasetsNotethatpositivevaluesrepresentanimprovementonmodelfitandviceversa
minus40
minus20
0
Preferential data Random data
Type of data
Impr
ovem
ent i
n ab
unda
nce
mod
elDIC
minus03
minus02
minus01
00
01
Preferential data Random data
Type of data
LCPO
minus25
minus20
minus15
minus10
minus05
00
Preferential data Random data
Type of data
MAE
emspensp emsp | emsp659PENNINO Et al
point‐patternprocesswhichrestrictedthemodelcomparison inTable1tocorrectingonlyoneoftheeffectseitherthebathymet‐ricorthespatialeffect
Thebestmodel (basedon theDICandLCPO)was theprefer‐ential one with an shared spatial effect (ieModel 5 in Table1)ThesecondmostrelevantmodelintermisDICandLCPOwasthe
F I G U R E 4 emspSimulatedabundanceagainstpredictedabundanceinthenon‐preferentialmodel(left)andinthemodelwiththepreferentialcorrection(right)foroneoftheonehundredsimulationsperformedThenon‐preferentialmodelpredictsworsethanthepreferentialmodelatlow‐abundanceareas
0 1 2 3 4 5
02
46
810
Simulated field
Non
minuspr
efer
entia
l pre
dict
ion
0 1 2 3 4 50
24
68
10
Simulated field
Pre
fere
ntia
l pre
dict
ion
F I G U R E 5 emspPosteriorpredictivemeanmapsofoneoftheonehundredsimulatedabundanceprocesseswithout(left)andwith(right)thepreferentialsamplingcorrection
20
25
30
35
40
45
50
55
20
25
30
35
40
45
50
55
60
(a) (b)
660emsp |emsp emspensp PENNINO Et al
preferential one that included in addition to the spatial effect asharedbathymetriceffect(ieModel8inTable1)
Figure7showsthemeanoftheposteriordistributionofthespa‐tialeffectinthemodelwithoutandwithpreferentialsamplingwhileFigure8illustratestheposteriorpredictivemeanoftheblueandredshrimp distribution without and with the preferential correctionBothfiguresshowasimilarpatternIndeeditisclearthatthespatialoutputsobtainedwiththepreferentialmodelbetterabsorbthevari‐abilityofthespecieshabitatprovidingamorenaturalpatternoftheblueandredshrimpdistribution
Finally formodel validation the final preferentialmodel ob‐tained a reasonably high value for Pearsons r (r=047) in thecross‐validationwiththeMEDITSdatasetwithrespecttothenon‐preferential one (r=022) It isworthmentioning thatMEDITSsareperformedonlyinspringsummerandfewsamplingsarecar‐riedoutinthestudyareaThesefindingsfurtherhighlighthowthe
correctionofthepreferentialmodelisimportanttoreflecttherealdistributionofaspecies
4emsp |emspDISCUSSION
Inthispaperwepresentedamodelingapproachthatcouldbeveryusefulformodelingthedistributionofspeciesusingopportunisticdataandacquiringin‐depthknowledgethatcouldbeessentialforthecorrectmanagementofnaturalresourcesSpatialecologyhasadirectappliedrelevancetonaturalresourcemanagementbutitalsohasabroadecologicalsignificanceAlthoughitmaybecompli‐catedtodefinetheboundariesofspecieshabitatscombinedwithanefficientmanagementthatrecognizestheimportanceofsuchareasthisrepresentsthefirststeptowardfacilitatinganeffectivespatialmanagement However as shown by our results using anon‐accurate approach could culminate inmisidentification of aspecieshabitatanduncertainpredictionsandsoininappropriatemanagementmeasureswhichcansometimesbeirreversible
Theresultsofthepracticalapplicationonblueandredshrimpincludedhereasarealworldscenarioshowthatpredictivemapssignificantly improve the prediction of the target specieswhenthemodelaccountsforpreferentialsamplingIndeeditisknownthatthesuitablebathymetricrangeoftheblueandredshrimpintheMediterraneanseaisbetween200and200meters(GuijarroMassutiacuteMorantaampDiacuteaz2008)thatisreflectedbythepreferen‐tialmodelpredictionmapIncontrastthenon‐preferentialmodelpredictionwasnotabletocapturetherealdistributionofthespe‐ciesInadditionifamanagementmeasuresuchascreationofamarineconservationareashouldbeappliedonthebasisof thenon‐preferentialmodel it isclearthat itwouldnotbeappropri‐ateThiscouldresultinextremelylargeareabeingrecommended
TA B L E 1 emspModelcomparisonfortheabundanceoftheblueandredshrimp(Aristeus antennatus)basedonDICLCPOandcomputationaltimes
Model DIC LCPO Times (s)
1 Intc+Depth +19 +005 3
2 Intc+Spatial +9 minus 24
3 Intc+Depth+Spatial +11 +001 57
4 Intc+Depth +52 +024 21
5 Intc+Spatial minus minus 171
6 Intc+Spatial+Depth +5 +002 212
7 Intc+Depth+Spatial +10 +001 2275
8 Intc+Depth+Spatial +3 +003 3470
NotesDICandLCPOscoresarepresentedasdeviationsfromthebestmodelIntcInterceptBoldtermssharedcomponents
F I G U R E 6 emspSensitivityanalysisofthepcpriordistributionsfortherangeandvarianceofasimulatedspatialfieldDashedlinesrepresentpriordistributionssolidlinesposteriordistributionsandverticallinestherealvaluesofeachofthehyperparametersofthespatialfieldrangeintheleftpanelandvarianceintherightpanelRangepriorsweresetsothattheprobabilityofhavingarangesmallerthan10203040and50ofthemaximumdistanceofthestudyareawas025Similarlypriorsoverthevarianceofthespatialfieldweresetsothattheprobabilityofhavingavariancehigherthan2345and6was01
05 10 15
00
05
10
15
20
25
Matern range
x
y
PosteriorPriorReal
PosteriorPriorReal
15 10 5
000
005
010
015
020
025
030
Matern variance
x
y
(a) (b)
emspensp emsp | emsp661PENNINO Et al
for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies
Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)
Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults
Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies
AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)
This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing
F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations
minus04
minus03
minus02
minus01
00
01
02
03
04
05
minus3
minus2
minus1
0
1
2
3
4
5
6
(a) (b)
F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations
25
30
35
40
45
minus4
minus3
minus2
minus1
0
1
2
3
4
5
(a) (b)
662emsp |emsp emspensp PENNINO Et al
parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence
Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges
ACKNOWLEDG MENTS
D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare
CONFLIC T OF INTERE S T
NoneDeclared
AUTHOR CONTRIBUTIONS
AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata
DATA ACCE SSIBILIT Y
Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit
R E FE R E N C E S
BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC
Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x
CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453
ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803
DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A
Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x
Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress
DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x
Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229
FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6
FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8
GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799
GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003
Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184
Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017
IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons
KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic
emspensp emsp | emsp663PENNINO Et al
observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x
LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609
Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x
LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64
Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281
PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062
ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods
RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x
RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609
Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x
RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223
SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25
Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064
SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004
SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353
Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2
VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska
WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079
How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789
658emsp |emsp emspensp PENNINO Et al
fishery‐independentdatacollectedduringtheMEDITS(EU‐fundedMEDIterranean Trawl Survey) The MEDITS is carried out fromspringtoearlysummer(ApriltoJune)everyyearintheareaanditusesarandomsampling(Penninoetal2016)
3emsp |emspRESULTS
31emsp|emspSimulated example
Resultsobtainedinthissimulationstudyshowthatnottakingintoconsiderationpreferentialsamplingleadstomisleadingresultsspe‐ciallyatlow‐abundanceareas
Figure3showsthedifferenceinDICLCPOandMAEscoresbetweenpreferentialandnon‐preferentialmodels forbothsam‐ples the ones distributed preferentially and the one distributedrandomlyInparticularmorethan75oftheDICLCPOandMAEvaluesobtainedinthe100simulatedGaussianfieldsusingprefer‐entialmodelswere lowerthantheonesobtainedwithnon‐pref‐erentialmodels
In addition as it can be appreciated in the example shown inFigure4whichpresentstheresultsofoneoftheonehundredsim‐ulationsforexplanatorypurposesevenifnoneofthemodelswas
able tomake optimal predictions at low abundance locations thenon‐preferentialmodelperformedsignificantlyworse
Similarly Figure5which shows the posterior predictivemeanofoneofthesimulatedabundanceprocesseswithoutandwiththepreferentialsamplingcorrection((a)and(b)respectively))illustratesthatalthoughbothmodelshavesimilarpredictivespatialpatternsthepreferentiallycorrectedmodelpredictsbetteratmoderate‐to‐lowabundanceareas
FinallyFigure6showshowalldifferentpriorsofthehyperpa‐rameters of the spatial field end up fitting posterior distributionsthatareallconcentratedwithintherealvaluesResultsclearlyindi‐catethatslightchangesinthepriorsarenotanissuewhendealingwiththepreferentialsamplingmodels
32emsp|emspDistribution of blue and red shrimp in the Western Mediterranean Sea
All possiblemodels derived from 6were run Among them themost relevant results are presented in Table1While analyzingthedataweobservedthatboththebathymetricandthespatialtermsof the LGCPaccounted for approximately the same infor‐mation As a consequence full models did not converge in the
F I G U R E 3 emsp Improvementofthepreferentialmodelagainstaconventionalmodelinmodelfitscores(DICLCPOandMAE)Comparisonisbasedon100preferentiallyandrandomlysampleddatasetsNotethatpositivevaluesrepresentanimprovementonmodelfitandviceversa
minus40
minus20
0
Preferential data Random data
Type of data
Impr
ovem
ent i
n ab
unda
nce
mod
elDIC
minus03
minus02
minus01
00
01
Preferential data Random data
Type of data
LCPO
minus25
minus20
minus15
minus10
minus05
00
Preferential data Random data
Type of data
MAE
emspensp emsp | emsp659PENNINO Et al
point‐patternprocesswhichrestrictedthemodelcomparison inTable1tocorrectingonlyoneoftheeffectseitherthebathymet‐ricorthespatialeffect
Thebestmodel (basedon theDICandLCPO)was theprefer‐ential one with an shared spatial effect (ieModel 5 in Table1)ThesecondmostrelevantmodelintermisDICandLCPOwasthe
F I G U R E 4 emspSimulatedabundanceagainstpredictedabundanceinthenon‐preferentialmodel(left)andinthemodelwiththepreferentialcorrection(right)foroneoftheonehundredsimulationsperformedThenon‐preferentialmodelpredictsworsethanthepreferentialmodelatlow‐abundanceareas
0 1 2 3 4 5
02
46
810
Simulated field
Non
minuspr
efer
entia
l pre
dict
ion
0 1 2 3 4 50
24
68
10
Simulated field
Pre
fere
ntia
l pre
dict
ion
F I G U R E 5 emspPosteriorpredictivemeanmapsofoneoftheonehundredsimulatedabundanceprocesseswithout(left)andwith(right)thepreferentialsamplingcorrection
20
25
30
35
40
45
50
55
20
25
30
35
40
45
50
55
60
(a) (b)
660emsp |emsp emspensp PENNINO Et al
preferential one that included in addition to the spatial effect asharedbathymetriceffect(ieModel8inTable1)
Figure7showsthemeanoftheposteriordistributionofthespa‐tialeffectinthemodelwithoutandwithpreferentialsamplingwhileFigure8illustratestheposteriorpredictivemeanoftheblueandredshrimp distribution without and with the preferential correctionBothfiguresshowasimilarpatternIndeeditisclearthatthespatialoutputsobtainedwiththepreferentialmodelbetterabsorbthevari‐abilityofthespecieshabitatprovidingamorenaturalpatternoftheblueandredshrimpdistribution
Finally formodel validation the final preferentialmodel ob‐tained a reasonably high value for Pearsons r (r=047) in thecross‐validationwiththeMEDITSdatasetwithrespecttothenon‐preferential one (r=022) It isworthmentioning thatMEDITSsareperformedonlyinspringsummerandfewsamplingsarecar‐riedoutinthestudyareaThesefindingsfurtherhighlighthowthe
correctionofthepreferentialmodelisimportanttoreflecttherealdistributionofaspecies
4emsp |emspDISCUSSION
Inthispaperwepresentedamodelingapproachthatcouldbeveryusefulformodelingthedistributionofspeciesusingopportunisticdataandacquiringin‐depthknowledgethatcouldbeessentialforthecorrectmanagementofnaturalresourcesSpatialecologyhasadirectappliedrelevancetonaturalresourcemanagementbutitalsohasabroadecologicalsignificanceAlthoughitmaybecompli‐catedtodefinetheboundariesofspecieshabitatscombinedwithanefficientmanagementthatrecognizestheimportanceofsuchareasthisrepresentsthefirststeptowardfacilitatinganeffectivespatialmanagement However as shown by our results using anon‐accurate approach could culminate inmisidentification of aspecieshabitatanduncertainpredictionsandsoininappropriatemanagementmeasureswhichcansometimesbeirreversible
Theresultsofthepracticalapplicationonblueandredshrimpincludedhereasarealworldscenarioshowthatpredictivemapssignificantly improve the prediction of the target specieswhenthemodelaccountsforpreferentialsamplingIndeeditisknownthatthesuitablebathymetricrangeoftheblueandredshrimpintheMediterraneanseaisbetween200and200meters(GuijarroMassutiacuteMorantaampDiacuteaz2008)thatisreflectedbythepreferen‐tialmodelpredictionmapIncontrastthenon‐preferentialmodelpredictionwasnotabletocapturetherealdistributionofthespe‐ciesInadditionifamanagementmeasuresuchascreationofamarineconservationareashouldbeappliedonthebasisof thenon‐preferentialmodel it isclearthat itwouldnotbeappropri‐ateThiscouldresultinextremelylargeareabeingrecommended
TA B L E 1 emspModelcomparisonfortheabundanceoftheblueandredshrimp(Aristeus antennatus)basedonDICLCPOandcomputationaltimes
Model DIC LCPO Times (s)
1 Intc+Depth +19 +005 3
2 Intc+Spatial +9 minus 24
3 Intc+Depth+Spatial +11 +001 57
4 Intc+Depth +52 +024 21
5 Intc+Spatial minus minus 171
6 Intc+Spatial+Depth +5 +002 212
7 Intc+Depth+Spatial +10 +001 2275
8 Intc+Depth+Spatial +3 +003 3470
NotesDICandLCPOscoresarepresentedasdeviationsfromthebestmodelIntcInterceptBoldtermssharedcomponents
F I G U R E 6 emspSensitivityanalysisofthepcpriordistributionsfortherangeandvarianceofasimulatedspatialfieldDashedlinesrepresentpriordistributionssolidlinesposteriordistributionsandverticallinestherealvaluesofeachofthehyperparametersofthespatialfieldrangeintheleftpanelandvarianceintherightpanelRangepriorsweresetsothattheprobabilityofhavingarangesmallerthan10203040and50ofthemaximumdistanceofthestudyareawas025Similarlypriorsoverthevarianceofthespatialfieldweresetsothattheprobabilityofhavingavariancehigherthan2345and6was01
05 10 15
00
05
10
15
20
25
Matern range
x
y
PosteriorPriorReal
PosteriorPriorReal
15 10 5
000
005
010
015
020
025
030
Matern variance
x
y
(a) (b)
emspensp emsp | emsp661PENNINO Et al
for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies
Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)
Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults
Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies
AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)
This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing
F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations
minus04
minus03
minus02
minus01
00
01
02
03
04
05
minus3
minus2
minus1
0
1
2
3
4
5
6
(a) (b)
F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations
25
30
35
40
45
minus4
minus3
minus2
minus1
0
1
2
3
4
5
(a) (b)
662emsp |emsp emspensp PENNINO Et al
parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence
Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges
ACKNOWLEDG MENTS
D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare
CONFLIC T OF INTERE S T
NoneDeclared
AUTHOR CONTRIBUTIONS
AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata
DATA ACCE SSIBILIT Y
Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit
R E FE R E N C E S
BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC
Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x
CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453
ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803
DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A
Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x
Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress
DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x
Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229
FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6
FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8
GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799
GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003
Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184
Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017
IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons
KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic
emspensp emsp | emsp663PENNINO Et al
observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x
LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609
Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x
LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64
Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281
PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062
ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods
RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x
RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609
Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x
RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223
SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25
Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064
SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004
SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353
Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2
VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska
WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079
How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789
emspensp emsp | emsp659PENNINO Et al
point‐patternprocesswhichrestrictedthemodelcomparison inTable1tocorrectingonlyoneoftheeffectseitherthebathymet‐ricorthespatialeffect
Thebestmodel (basedon theDICandLCPO)was theprefer‐ential one with an shared spatial effect (ieModel 5 in Table1)ThesecondmostrelevantmodelintermisDICandLCPOwasthe
F I G U R E 4 emspSimulatedabundanceagainstpredictedabundanceinthenon‐preferentialmodel(left)andinthemodelwiththepreferentialcorrection(right)foroneoftheonehundredsimulationsperformedThenon‐preferentialmodelpredictsworsethanthepreferentialmodelatlow‐abundanceareas
0 1 2 3 4 5
02
46
810
Simulated field
Non
minuspr
efer
entia
l pre
dict
ion
0 1 2 3 4 50
24
68
10
Simulated field
Pre
fere
ntia
l pre
dict
ion
F I G U R E 5 emspPosteriorpredictivemeanmapsofoneoftheonehundredsimulatedabundanceprocesseswithout(left)andwith(right)thepreferentialsamplingcorrection
20
25
30
35
40
45
50
55
20
25
30
35
40
45
50
55
60
(a) (b)
660emsp |emsp emspensp PENNINO Et al
preferential one that included in addition to the spatial effect asharedbathymetriceffect(ieModel8inTable1)
Figure7showsthemeanoftheposteriordistributionofthespa‐tialeffectinthemodelwithoutandwithpreferentialsamplingwhileFigure8illustratestheposteriorpredictivemeanoftheblueandredshrimp distribution without and with the preferential correctionBothfiguresshowasimilarpatternIndeeditisclearthatthespatialoutputsobtainedwiththepreferentialmodelbetterabsorbthevari‐abilityofthespecieshabitatprovidingamorenaturalpatternoftheblueandredshrimpdistribution
Finally formodel validation the final preferentialmodel ob‐tained a reasonably high value for Pearsons r (r=047) in thecross‐validationwiththeMEDITSdatasetwithrespecttothenon‐preferential one (r=022) It isworthmentioning thatMEDITSsareperformedonlyinspringsummerandfewsamplingsarecar‐riedoutinthestudyareaThesefindingsfurtherhighlighthowthe
correctionofthepreferentialmodelisimportanttoreflecttherealdistributionofaspecies
4emsp |emspDISCUSSION
Inthispaperwepresentedamodelingapproachthatcouldbeveryusefulformodelingthedistributionofspeciesusingopportunisticdataandacquiringin‐depthknowledgethatcouldbeessentialforthecorrectmanagementofnaturalresourcesSpatialecologyhasadirectappliedrelevancetonaturalresourcemanagementbutitalsohasabroadecologicalsignificanceAlthoughitmaybecompli‐catedtodefinetheboundariesofspecieshabitatscombinedwithanefficientmanagementthatrecognizestheimportanceofsuchareasthisrepresentsthefirststeptowardfacilitatinganeffectivespatialmanagement However as shown by our results using anon‐accurate approach could culminate inmisidentification of aspecieshabitatanduncertainpredictionsandsoininappropriatemanagementmeasureswhichcansometimesbeirreversible
Theresultsofthepracticalapplicationonblueandredshrimpincludedhereasarealworldscenarioshowthatpredictivemapssignificantly improve the prediction of the target specieswhenthemodelaccountsforpreferentialsamplingIndeeditisknownthatthesuitablebathymetricrangeoftheblueandredshrimpintheMediterraneanseaisbetween200and200meters(GuijarroMassutiacuteMorantaampDiacuteaz2008)thatisreflectedbythepreferen‐tialmodelpredictionmapIncontrastthenon‐preferentialmodelpredictionwasnotabletocapturetherealdistributionofthespe‐ciesInadditionifamanagementmeasuresuchascreationofamarineconservationareashouldbeappliedonthebasisof thenon‐preferentialmodel it isclearthat itwouldnotbeappropri‐ateThiscouldresultinextremelylargeareabeingrecommended
TA B L E 1 emspModelcomparisonfortheabundanceoftheblueandredshrimp(Aristeus antennatus)basedonDICLCPOandcomputationaltimes
Model DIC LCPO Times (s)
1 Intc+Depth +19 +005 3
2 Intc+Spatial +9 minus 24
3 Intc+Depth+Spatial +11 +001 57
4 Intc+Depth +52 +024 21
5 Intc+Spatial minus minus 171
6 Intc+Spatial+Depth +5 +002 212
7 Intc+Depth+Spatial +10 +001 2275
8 Intc+Depth+Spatial +3 +003 3470
NotesDICandLCPOscoresarepresentedasdeviationsfromthebestmodelIntcInterceptBoldtermssharedcomponents
F I G U R E 6 emspSensitivityanalysisofthepcpriordistributionsfortherangeandvarianceofasimulatedspatialfieldDashedlinesrepresentpriordistributionssolidlinesposteriordistributionsandverticallinestherealvaluesofeachofthehyperparametersofthespatialfieldrangeintheleftpanelandvarianceintherightpanelRangepriorsweresetsothattheprobabilityofhavingarangesmallerthan10203040and50ofthemaximumdistanceofthestudyareawas025Similarlypriorsoverthevarianceofthespatialfieldweresetsothattheprobabilityofhavingavariancehigherthan2345and6was01
05 10 15
00
05
10
15
20
25
Matern range
x
y
PosteriorPriorReal
PosteriorPriorReal
15 10 5
000
005
010
015
020
025
030
Matern variance
x
y
(a) (b)
emspensp emsp | emsp661PENNINO Et al
for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies
Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)
Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults
Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies
AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)
This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing
F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations
minus04
minus03
minus02
minus01
00
01
02
03
04
05
minus3
minus2
minus1
0
1
2
3
4
5
6
(a) (b)
F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations
25
30
35
40
45
minus4
minus3
minus2
minus1
0
1
2
3
4
5
(a) (b)
662emsp |emsp emspensp PENNINO Et al
parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence
Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges
ACKNOWLEDG MENTS
D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare
CONFLIC T OF INTERE S T
NoneDeclared
AUTHOR CONTRIBUTIONS
AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata
DATA ACCE SSIBILIT Y
Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit
R E FE R E N C E S
BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC
Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x
CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453
ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803
DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A
Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x
Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress
DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x
Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229
FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6
FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8
GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799
GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003
Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184
Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017
IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons
KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic
emspensp emsp | emsp663PENNINO Et al
observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x
LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609
Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x
LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64
Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281
PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062
ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods
RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x
RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609
Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x
RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223
SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25
Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064
SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004
SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353
Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2
VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska
WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079
How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789
660emsp |emsp emspensp PENNINO Et al
preferential one that included in addition to the spatial effect asharedbathymetriceffect(ieModel8inTable1)
Figure7showsthemeanoftheposteriordistributionofthespa‐tialeffectinthemodelwithoutandwithpreferentialsamplingwhileFigure8illustratestheposteriorpredictivemeanoftheblueandredshrimp distribution without and with the preferential correctionBothfiguresshowasimilarpatternIndeeditisclearthatthespatialoutputsobtainedwiththepreferentialmodelbetterabsorbthevari‐abilityofthespecieshabitatprovidingamorenaturalpatternoftheblueandredshrimpdistribution
Finally formodel validation the final preferentialmodel ob‐tained a reasonably high value for Pearsons r (r=047) in thecross‐validationwiththeMEDITSdatasetwithrespecttothenon‐preferential one (r=022) It isworthmentioning thatMEDITSsareperformedonlyinspringsummerandfewsamplingsarecar‐riedoutinthestudyareaThesefindingsfurtherhighlighthowthe
correctionofthepreferentialmodelisimportanttoreflecttherealdistributionofaspecies
4emsp |emspDISCUSSION
Inthispaperwepresentedamodelingapproachthatcouldbeveryusefulformodelingthedistributionofspeciesusingopportunisticdataandacquiringin‐depthknowledgethatcouldbeessentialforthecorrectmanagementofnaturalresourcesSpatialecologyhasadirectappliedrelevancetonaturalresourcemanagementbutitalsohasabroadecologicalsignificanceAlthoughitmaybecompli‐catedtodefinetheboundariesofspecieshabitatscombinedwithanefficientmanagementthatrecognizestheimportanceofsuchareasthisrepresentsthefirststeptowardfacilitatinganeffectivespatialmanagement However as shown by our results using anon‐accurate approach could culminate inmisidentification of aspecieshabitatanduncertainpredictionsandsoininappropriatemanagementmeasureswhichcansometimesbeirreversible
Theresultsofthepracticalapplicationonblueandredshrimpincludedhereasarealworldscenarioshowthatpredictivemapssignificantly improve the prediction of the target specieswhenthemodelaccountsforpreferentialsamplingIndeeditisknownthatthesuitablebathymetricrangeoftheblueandredshrimpintheMediterraneanseaisbetween200and200meters(GuijarroMassutiacuteMorantaampDiacuteaz2008)thatisreflectedbythepreferen‐tialmodelpredictionmapIncontrastthenon‐preferentialmodelpredictionwasnotabletocapturetherealdistributionofthespe‐ciesInadditionifamanagementmeasuresuchascreationofamarineconservationareashouldbeappliedonthebasisof thenon‐preferentialmodel it isclearthat itwouldnotbeappropri‐ateThiscouldresultinextremelylargeareabeingrecommended
TA B L E 1 emspModelcomparisonfortheabundanceoftheblueandredshrimp(Aristeus antennatus)basedonDICLCPOandcomputationaltimes
Model DIC LCPO Times (s)
1 Intc+Depth +19 +005 3
2 Intc+Spatial +9 minus 24
3 Intc+Depth+Spatial +11 +001 57
4 Intc+Depth +52 +024 21
5 Intc+Spatial minus minus 171
6 Intc+Spatial+Depth +5 +002 212
7 Intc+Depth+Spatial +10 +001 2275
8 Intc+Depth+Spatial +3 +003 3470
NotesDICandLCPOscoresarepresentedasdeviationsfromthebestmodelIntcInterceptBoldtermssharedcomponents
F I G U R E 6 emspSensitivityanalysisofthepcpriordistributionsfortherangeandvarianceofasimulatedspatialfieldDashedlinesrepresentpriordistributionssolidlinesposteriordistributionsandverticallinestherealvaluesofeachofthehyperparametersofthespatialfieldrangeintheleftpanelandvarianceintherightpanelRangepriorsweresetsothattheprobabilityofhavingarangesmallerthan10203040and50ofthemaximumdistanceofthestudyareawas025Similarlypriorsoverthevarianceofthespatialfieldweresetsothattheprobabilityofhavingavariancehigherthan2345and6was01
05 10 15
00
05
10
15
20
25
Matern range
x
y
PosteriorPriorReal
PosteriorPriorReal
15 10 5
000
005
010
015
020
025
030
Matern variance
x
y
(a) (b)
emspensp emsp | emsp661PENNINO Et al
for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies
Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)
Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults
Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies
AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)
This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing
F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations
minus04
minus03
minus02
minus01
00
01
02
03
04
05
minus3
minus2
minus1
0
1
2
3
4
5
6
(a) (b)
F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations
25
30
35
40
45
minus4
minus3
minus2
minus1
0
1
2
3
4
5
(a) (b)
662emsp |emsp emspensp PENNINO Et al
parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence
Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges
ACKNOWLEDG MENTS
D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare
CONFLIC T OF INTERE S T
NoneDeclared
AUTHOR CONTRIBUTIONS
AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata
DATA ACCE SSIBILIT Y
Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit
R E FE R E N C E S
BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC
Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x
CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453
ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803
DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A
Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x
Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress
DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x
Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229
FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6
FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8
GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799
GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003
Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184
Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017
IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons
KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic
emspensp emsp | emsp663PENNINO Et al
observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x
LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609
Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x
LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64
Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281
PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062
ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods
RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x
RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609
Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x
RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223
SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25
Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064
SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004
SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353
Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2
VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska
WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079
How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789
emspensp emsp | emsp661PENNINO Et al
for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies
Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)
Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults
Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies
AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)
This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing
F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations
minus04
minus03
minus02
minus01
00
01
02
03
04
05
minus3
minus2
minus1
0
1
2
3
4
5
6
(a) (b)
F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations
25
30
35
40
45
minus4
minus3
minus2
minus1
0
1
2
3
4
5
(a) (b)
662emsp |emsp emspensp PENNINO Et al
parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence
Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges
ACKNOWLEDG MENTS
D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare
CONFLIC T OF INTERE S T
NoneDeclared
AUTHOR CONTRIBUTIONS
AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata
DATA ACCE SSIBILIT Y
Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit
R E FE R E N C E S
BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC
Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x
CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453
ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803
DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A
Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x
Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress
DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x
Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229
FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6
FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8
GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799
GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003
Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184
Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017
IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons
KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic
emspensp emsp | emsp663PENNINO Et al
observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x
LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609
Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x
LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64
Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281
PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062
ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods
RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x
RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609
Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x
RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223
SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25
Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064
SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004
SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353
Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2
VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska
WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079
How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789
662emsp |emsp emspensp PENNINO Et al
parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence
Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges
ACKNOWLEDG MENTS
D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare
CONFLIC T OF INTERE S T
NoneDeclared
AUTHOR CONTRIBUTIONS
AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata
DATA ACCE SSIBILIT Y
Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit
R E FE R E N C E S
BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC
Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x
CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453
ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803
DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A
Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x
Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress
DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x
Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229
FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6
FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8
GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799
GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003
Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184
Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017
IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons
KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic
emspensp emsp | emsp663PENNINO Et al
observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x
LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609
Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x
LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64
Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281
PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062
ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods
RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x
RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609
Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x
RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223
SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25
Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064
SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004
SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353
Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2
VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska
WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079
How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789
emspensp emsp | emsp663PENNINO Et al
observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x
LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609
Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x
LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64
Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281
PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062
ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods
RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x
RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609
Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x
RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223
SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25
Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064
SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004
SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353
Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2
VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska
WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079
How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789