11
Ecology and Evolution. 2019;9:653–663. | 653 www.ecolevol.org Received: 25 April 2018 | Revised: 25 September 2018 | Accepted: 2 October 2018 DOI: 10.1002/ece3.4789 ORIGINAL RESEARCH Accounting for preferential sampling in species distribution models Maria Grazia Pennino 1 | Iosu Paradinas 2,3 | Janine B. Illian 4 | Facundo Muñoz 2 | José María Bellido 5 | Antonio López‐Quílez 2 | David Conesa 2 This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2018 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. 1 Instituto Español de Oceanografía, Centro Oceanográfico de Vigo, Vigo, Spain 2 Departament ďEstadística i Investigació Operativa, Universitat de València, Valencia, Spain 3 Ipar Perspective Asociación, Sopela, Spain 4 School of Mathematics and Statistics, Centre for Research into Ecological and Environmental Modelling (CREEM), University of St Andrews, St Andrews, UK 5 Instituto Español de Oceanografía, Centro Oceanográfico de Murcia, Murcia, Spain Correspondence Maria Grazia Pennino, Instituto Español de Oceanografía, Centro Oceanográfico de Vigo, Vigo, Spain. Email: [email protected] Funding information European Regional Development Fund, Grant/Award Number: MTM2013‐42323‐P, MTM2016‐77501‐P and ACOMP/2015/202 Abstract Species distribution models (SDMs) are now being widely used in ecology for man‐ agement and conservation purposes across terrestrial, freshwater, and marine realms. The increasing interest in SDMs has drawn the attention of ecologists to spatial mod‐ els and, in particular, to geostatistical models, which are used to associate observa‐ tions of species occurrence or abundance with environmental covariates in a finite number of locations in order to predict where (and how much of) a species is likely to be present in unsampled locations. Standard geostatistical methodology assumes that the choice of sampling locations is independent of the values of the variable of interest. However, in natural environments, due to practical limitations related to time and financial constraints, this theoretical assumption is often violated. In fact, data commonly derive from opportunistic sampling (e.g., whale or bird watching), in which observers tend to look for a specific species in areas where they expect to find it. These are examples of what is referred to as preferential sampling, which can lead to biased predictions of the distribution of the species. The aim of this study is to discuss a SDM that addresses this problem and that it is more computationally effi‐ cient than existing MCMC methods. From a statistical point of view, we interpret the data as a marked point pattern, where the sampling locations form a point pattern and the measurements taken in those locations (i.e., species abundance or occur‐ rence) are the associated marks. Inference and prediction of species distribution is performed using a Bayesian approach, and integrated nested Laplace approximation (INLA) methodology and software are used for model fitting to minimize the compu‐ tational burden. We show that abundance is highly overestimated at low abundance locations when preferential sampling effects not accounted for, in both a simulated example and a practical application using fishery data. This highlights that ecologists should be aware of the potential bias resulting from preferential sampling and ac‐ count for it in a model when a survey is based on non‐randomized and/or non‐sys‐ tematic sampling.

Accounting for preferential sampling in species ... · Ecology and Evolution. 2019;9:653–663. | 653 Received: 25 April 2018 | Revised: 25 September 2018 | Accepted: 2 October 2018

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Ecology and Evolution 20199653ndash663 emsp|emsp653wwwecolevolorg

Received25April2018emsp |emsp Revised25September2018emsp |emsp Accepted2October2018DOI101002ece34789

O R I G I N A L R E S E A R C H

Accounting for preferential sampling in species distribution models

Maria Grazia Pennino1emsp|emspIosu Paradinas23emsp|emspJanine B Illian4emsp|emspFacundo Muntildeoz2emsp|emsp Joseacute Mariacutea Bellido5emsp|emspAntonio Loacutepez‐Quiacutelez2emsp|emspDavid Conesa2

ThisisanopenaccessarticleunderthetermsoftheCreativeCommonsAttributionLicensewhichpermitsusedistributionandreproductioninanymediumprovidedtheoriginalworkisproperlycitedcopy2018TheAuthorsEcology and EvolutionpublishedbyJohnWileyampSonsLtd

1InstitutoEspantildeoldeOceanografiacuteaCentroOceanograacuteficodeVigoVigoSpain2DepartamentďEstadiacutesticaiInvestigacioacuteOperativaUniversitatdeValegravenciaValenciaSpain3IparPerspectiveAsociacioacutenSopelaSpain4SchoolofMathematicsandStatisticsCentreforResearchintoEcologicalandEnvironmentalModelling(CREEM)UniversityofStAndrewsStAndrewsUK5InstitutoEspantildeoldeOceanografiacuteaCentroOceanograacuteficodeMurciaMurciaSpain

CorrespondenceMariaGraziaPenninoInstitutoEspantildeoldeOceanografiacuteaCentroOceanograacuteficodeVigoVigoSpainEmailgraziapenninoieoes

Funding informationEuropeanRegionalDevelopmentFundGrantAwardNumberMTM2013‐42323‐PMTM2016‐77501‐PandACOMP2015202

AbstractSpeciesdistributionmodels(SDMs)arenowbeingwidelyusedinecologyforman‐agementandconservationpurposesacrossterrestrialfreshwaterandmarinerealmsTheincreasinginterestinSDMshasdrawntheattentionofecologiststospatialmod‐elsandinparticulartogeostatisticalmodelswhichareusedtoassociateobserva‐tionsofspeciesoccurrenceorabundancewithenvironmentalcovariatesinafinitenumberoflocationsinordertopredictwhere(andhowmuchof)aspeciesislikelytobe present in unsampled locations Standard geostatisticalmethodology assumesthatthechoiceofsamplinglocationsisindependentofthevaluesofthevariableofinterestHowever in natural environments due to practical limitations related totimeandfinancialconstraintsthistheoreticalassumptionisoftenviolatedInfactdatacommonlyderivefromopportunisticsampling(egwhaleorbirdwatching)inwhichobserverstendtolookforaspecificspeciesinareaswheretheyexpecttofinditTheseareexamplesofwhatisreferredtoaspreferential samplingwhichcanleadtobiasedpredictionsofthedistributionofthespeciesTheaimofthisstudy istodiscussaSDMthataddressesthisproblemandthatitismorecomputationallyeffi‐cientthanexistingMCMCmethodsFromastatisticalpointofviewweinterpretthedataasamarkedpointpatternwherethesamplinglocationsformapointpatternand themeasurements taken in those locations (ie species abundanceoroccur‐rence)aretheassociatedmarksInferenceandpredictionofspeciesdistributionisperformedusingaBayesianapproachandintegratednestedLaplaceapproximation(INLA)methodologyandsoftwareareusedformodelfittingtominimizethecompu‐tationalburdenWeshowthatabundanceishighlyoverestimatedatlowabundancelocationswhenpreferentialsamplingeffectsnotaccountedforinbothasimulatedexampleandapracticalapplicationusingfisherydataThishighlightsthatecologistsshouldbeawareof thepotentialbias resulting frompreferential samplingandac‐countforitinamodelwhenasurveyisbasedonnon‐randomizedandornon‐sys‐tematicsampling

654emsp |emsp emspensp PENNINO Et al

1emsp |emspINTRODUC TION

An increasing interest in Species distribution models (SDMs) formanagementandconservationpurposeshasdrawntheattentionofecologiststospatialmodels(Dormannetal2007)SDMsarerele‐vantintheoreticalandpracticalcontextswherethereisaninterestinforexampleassessingtherelationshipbetweenspeciesandtheirenvironmentidentifyingandmanagingprotectedareasandpredict‐ingaspeciesrsquoresponsetoecologicalchanges(LatimerWuGelfandampSilander2006)Inallthesecontextsthemainissueistolinkinfor‐mationontheabundancepresenceabsenceorpresenceonlyofaspeciestoenvironmentalvariablestopredictwhere(andhowmuchof)aspeciesislikelytobepresentinunsampledlocationselsewhereinspace

Instudiesofspeciesdistributioncollectingdataonthespeciesofinterestisnotatrivialproblem([Keryetal2010)Withtheexcep‐tionofafewstudies(ThogmartinKnutsonampSauer2006)SDMsfrequentlyrelyonopportunisticdatacollectionduetothehighcostandtimeconsumingnatureofsamplingdatainthefieldespeciallyona largespatialscale(Keryetal2010) Indeed it isofteninfea‐sibletocollectdatabasedonawell‐designedrandomizedandorsystematicsamplingschemetoestimatethedistributionofaspecificspeciesover theentireareaof interest (BrotonsHerrandoampPla2007)HencevarioustypesofopportunisticsamplingschemesarecommonlyusedAsanexamplestudiesonseamammalscommonlyresorttotheaffordablepracticeofsamplingfromrecreationalboats(so‐called platforms of opportunity) whose bearings are neitherrandomnorsystematic(RodriacuteguezBrotonsBustamanteampSeoane2007)SimilarlybirddataareoftenderivedfromonlinedatabasessuchaseBirdwhichmakeavailablelocationsofbirdssightedbybird‐watcherswhotendtovisithabitatssuitableforinterestingspecies(httpsebirdorg)Alsointhecontextoffisheryecologyfishery‐dependent survey data are often derived from commercial fleetstendtobereadilyavailableforanalysisHoweverthefishingboatsnaturallytendtofishinlocationswheretheyexpectahighconcen‐trationoftheirtargetspecies(VasconcellosampCochrane2005)

AllthesetypesofopportunisticallycollecteddatatendtosufferfromaspecificcomplicationThesamplingschemethatdeterminessamplinglocationsisnotrandomandhencenotindependentoftheresponsevariableofinterestforexamplespeciesabundance(ConnThorsonampJohnson2017DiggleMenezesampSu2010)HoweverSDMstypicallyassumeifonlyimplicitlythatsamplinglocationsarenot informativeand that theyhavebeenchosen independentlyofwhatvaluesareexpectedtobeobservedinaspecificlocationThisassumption is typically violated for opportunistic data resulting inpreferentiallysampleddatacollectedinlocationsthatweredeliber‐atelychoseninareaswheretheabundanceofthespeciesofinterest

isthoughttobeparticularlyhighor lowThisviolationleadstobi‐asedestimatesandpredictions(Diggleetal2010)

Consequentlybiasedestimationandpredictionsofspeciesdis‐tributionleadtobadlyinformeddecisionmakingandtoinefficientorinappropriatemanagementofnaturalresources(Connetal2017Diggle etal 2010DinsdaleampSalibian‐Barrera 2018) This paperseeks to address preferential sampling in the context of fisheriesecologywherethis issue isparticularlyrelevantsincethe identifi‐cationandmanagementof sensitivehabitats (eg throughmarineprotected areas nurseries high‐discard locations) is a commonconservationtoolusedtosustainthelong‐termviabilityofspeciespopulations

Diggleetal (2010)suggestamodelingapproachthataccountsfor preferential sampling using likelihood‐based inference withMonteCarlomethodsHowevertheresultingapproachcanbecom‐putationallyintensiveastheauthorsrecognizeintheirreplytotheissuesraisedonthediscussionoftheirpaperwhichimpliesthatitisquitedifficult touse inpractical situations especiallywhen theobjectiveof theanalysis is topredict into inextendedareasThisisofsignificantconcernaspredictionisoftenthemainobjectiveofSDMsandpreferentialsamplingissuesarebytheirverydefinitionapracticalproblemthatneedstobeaddressedinaformthatmakesthemaccessibletousers

AsRueMartinoMondalandChopin(2010)indicateinthedis‐cussiononDiggleetalrsquospaperpreferentialsamplingmaybeseenasamarkedspatialpointprocessmodelinparticularamarkedlog‐GaussianCoxprocessThesemodelscanbefittedinacomputation‐ally efficient way using integrated nested Laplace approximation(INLA)andassociatedsoftwareRueMartinoandChopin(2009)inafastcomputationalway

In this studywe thoroughly explain themethodology for per‐formingpreferentialsamplingmodelsusingtheapproachproposedbyRueetal(2010)withinthecontextofSDMswiththefinalaimtoprovideguidanceontheappropriateuseandinterpretationofthefittedmodelsApracticalapplicationassessingthespatialdistribu‐tion of blue and red shrimp (Aristeus antennatus Risso 1816) fromfishingdataintheGulfofAlicante(Spain)isprovidedasatutorialwhilesimulateddataareusedtodemonstrateperformanceissuesofstandardmethodswhichdonotaccountforpreferentialsampling

2emsp |emspMATERIAL AND METHODS

Preferentiallycollecteddataconsistoftwopiecesofinformation(a)samplinglocationsand(b)measuredabundance(oroccurrence)oftargetspeciesintheselocationswheretheintensityofsamplinglo‐cationsispositivelyornegativelycorrelatedwithabundancethatis

K E Y W O R D S

BayesianmodellingintegratednestedLaplaceapproximationpointprocessesspeciesdistributionmodelsstochasticpartialdifferentialequation

emspensp emsp | emsp655PENNINO Et al

(a)and(b)arenotindependentPredictingthedistributionoftargetspeciesusingthistypeofdataimpliesthatthesamplingdistributionisnotuniformlyrandomasittendstohavemoreobservationswhereabundance ishigher and thusbasic statisticalmodelassumptionsareviolated

In order to overcome this problem Diggle etal (2010) pro‐posedto interpretthepreferentialsamplingprocessasamarked point processes modelThisapproachusestheinformationonsam‐plinglocationsandmodelsthemalongwiththemeasuredvaluesofthevariableofinterestinajointmodelInparticularsamplinglocationsareinterpretedasaspatial point patternaccountingfora higher point intensitywheremeasured values are higher Themeasurementstakenineachofthepoints(themarkinpointpro‐cessterminology)aremodeledalongwithandassumedtobepo‐tentiallydependentonthepoint‐patternintensityinajointmodelThisapproachaccountsforpreferentialsamplingwhilestillmak‐ingpredictionsforthevariableof interest inspace Inparticularinthefisheryexamplediscussedherefishinglocationsareinter‐pretedasapointpatternwhilethespeciescatchateachlocationisinterpretedasamark

21emsp|emspStatistical model

Formallyaspatialpointpatternconsistsofthespatial locationsofeventsorobjectsinadefinedstudyregionIllianPenttinenStoyanandStoyan (2008)Examples include locationsofspecies inapar‐ticularareaorparasitesinamicrobiologycultureSpatialpointpro‐cessesaremathematicalmodels(randomvariables)usedtodescribeandanalyze these spatialpatternsA simple theoreticalmodel foraspatialpointpattern is thePoissonprocessusuallydescribed intermsofitsintensityfunctionΛxThisintensityfunctionrepresentsthedistributionoflocations(ldquopointsrdquo)inspaceInaPoissonprocessthenumberofpointsfollowsaPoissondistributionandthelocationsofthesepointsindependentofanyoftheotherpointsThehomoge‐neousPoissonprocessrepresentscompletespatialrandomnessandservesasareferenceornullmodelinmanyapplications

Theintensityofapointprocessthatisthenumberofpointsperunitareamayeitherbeconstantoverspaceresultinginahomoge‐neousorstationarypatternorvaryinspacewithaspatialtrendre‐sultinginanon‐homogeneouspatternHowevertheassumptionofstationarityisgenerallyunrealisticinmostSDMapplicationsastheintensityfunctionvarieswiththeenvironmentmakingnon‐homo‐geneousPoissonprocessespotentiallyabetterchoice todescribespeciesdistributionbasedonatrendfunctionthatmaydependoncovariatesNonethelessinapplicationscovariatesmaynotexplaintheentirespatialstructureinaspatialpatternIncontrasttheclassofCoxprocessesprovidestheflexibilitytomodelaggregatedpointpatterns relative to observed and unobserved abiotic and bioticmechanismsHere spatial structures inanobservedpointpatternmayreflectdependenceonknownandmeasuredcovariatesaswellas on unknown or unmeasurable covariates or bioticmechanismsthat cannotbe readily representedby a covariate suchasdisper‐sallimitationIndeedthespatialstructureofbothabioticandbiotic

variablescan impactonecologicalprocessesandconsequentlybereflectedinthespeciesdistribution

Log‐GaussianCoxprocesses(LGCPs)areaspecificclassofCoxprocessmodelsinwhichthelogarithmoftheintensitysurfaceisaGaussianrandomfieldGiventherandomfieldMoreformally

where Vs is aGaussian random fieldGiven the random field theobservedlocationss = (s1hellipsn)areindependentandformaPoissonprocess

Inthecaseofpreferentiallysampledspeciestheobservedabun‐danceY = (y1hellipyn)isalsolinkedtotheintensityoftheunderlyingspatial field Inpracticeonlyvery fewsamplesmightbeavailableinsomeareasif it isassumedthatabundanceisparticularlylowinthese areas The LGCPmodel fitted to the sampling locations re‐flectsareaswithlowspeciesabundancethathaveresultedinareaswith fewer sampling locationsTo incorporate such information inthe SDM abundance model we apply joint modeling techniqueswhichallowfittingsharedmodelcomponentsinmodelswithtwoormorelinearpredictorsHereweconsidertwodependentpredictorswithtworesponsesthatistheobservedspeciesabundances(themarks)andtheintensityofthepointprocessreflectingsamplingin‐tensitythroughspace

This results in apreferential samplingmodel that consistsoftwo levelswhere information is sharedbetween the two levelsthemarkmodelandthepatternmodel Inparticular themarkY isassumedtofollowanexponentialfamilydistributionsuchasaGaussian lognormal or gamma distribution for continuous vari‐ablesoraPoissondistribution forcountdata Inall thesecasesthemeanμs isrelatedtothespatialtermthroughanappropriatelinkfunctionη

where prime0 istheinterceptofthemodelthecoefficientsprime

nquantify

theeffectofsomecovariatesXnontheresponseandWsisthespa‐tialeffectofthemodelthatistheGaussianrandomfieldTheco‐variatesintheadditivepredictorareenvironmentalfeatureslinkedtohabitatpreferencesofaspecies

InthesecondpartofthemodelanLGCPmodelwithintensityfunctionΛsreflectingthesamplinglocations

where β0istheinterceptoftheLGCPthecoefficientsβnquantifytheeffectofsomecovariatesXnontheintensityfunctionandWsisthespatialtermsharedwiththeLGCPbutscaledbyαtoallowforthedif‐ferencesinscalebetweenthemarkvaluesandtheLGCPintensities

Bayesianinferenceturnsouttobeagoodoptiontofitspatialhierarchicalmodelsbecauseitallowsboththeobserveddataandmodel parameters to be random variables ([Banerjee Carlin ampGelfand2004)resultinginamorerealisticandaccurateestima‐tionofuncertaintyAnimportantissueoftheBayesianapproachisthatpriordistributionsmustbeassignedtotheparameters(inourcaseβ0and

prime

0)andhyperparameters(inourcasethoseofthe

(1)log(Λs)=Vs

(2)(s)=0+

nXn+Ws

(3)Λs=exp

0+nXn+Ws

656emsp |emsp emspensp PENNINO Et al

spatialeffectW )involvedinthemodelsNeverthelessasusualinthiskindofmodels theresultingposteriordistributionsarenotanalyticallyknownandsonumericalapproachesareneeded

22emsp|emspFitting models with INLA

Model‐fitting methods based on Markov chain Monte Carlo(MCMC)canbevery time‐consuming for spatialmodels inpar‐ticularLGCPsNeverthelessLGCPsareaspecialcaseofthemoregeneralclassof latentGaussianmodelswhichcanbedescribedas a subclass of structured additive regression (STAR) models(Fahrmeir amp Tutz 2001) In these models the mean of the re‐sponsevariableis linkedtoastructuredpredictorwhichcanbeexpressedintermsoflinearandnon‐lineareffectsofcovariatesInaBayesianframeworkbyassigningGaussianpriorstoallran‐domtermsinthepredictorweobtainalatentGaussianmodelAsaresultwecandirectlycomputeLGCPmodelsusingIntegratednested Laplace approximation (INLA) INLAprovides a fast yetaccurate approach to fitting latentGaussianmodels andmakesthe inclusion of covariates andmarked point processesmathe‐maticallytractablewithcomputationallyefficientinference(Illianetal2013SimpsonIllianLindgrenSoslashrbyeampRue2016)

23emsp|emspPreferential and non‐preferential models

Asmentionedabovetheresultingpreferentialmodelcanbeex‐pressedasatwo‐partmodelasfollowsAssumingthattheob‐servedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp

0+Ws

wehavetoassignadistributionfortheabundanceBasedonthefactthattheabundanceisusuallyapositiveoutcomewehave considereda gammadistributionalthough clearly other options could be possible (exponentiallognormalorPoissonforcounts)Thisyields

wheretheGaussianrandomfieldWslinkstheLGCPandtheabundanceprocessscaledbyα inthePoissonprocesspredictortoallowfordif‐ferencesinscaleThematrixQ() isestimatedimplicitlythroughanapproximation of the Gaussian random field through the stochasticpartialdifferentialequation(SPDE)approachasinLindgrenRueandLindstroumlm(2011)andSimpsonetal(2016)

Itisworthnotingthatnottakingaccountofpreferentialsamplingleads to biased results This can be easily seen by comparing thepreferentialsamplingapproachwiththefollowingsimplermodel

Notethatthemodel inEquation5assumesthat for therandomfieldswehaveZneWwhereasthepreferentialsamplingmodelassumesasinglesharedrandomfieldforboththepointpatternandthemarkForbothmodelspriordistributionshavetobechosenfortoallthepa‐rametersandhyperparametersWehaveassignedvaguepriorsthatisusedthedefaultinR‐INLAduetoagenerallackofpriorinformationAnotherapproachthatcouldbeusedhereispenalizedcomplexitypri‐ors (hereafterpcpriors) asdescribed inFuglstadSimpson LindgrenandRue(2017)andreadilyavailableinR‐INLA

Asmentionedabovebothmodelsin4and5mayincludecovari‐atesthesignificanceofwhichmaybetestedthroughmodelselectionproceduresThereisverylittleliteratureavailableonmodelselectionforpointprocessmodelshowevercriterialikethedevianceinforma‐tioncriterion(DIC)(SpiegelhalterBestCarlinmampVanDerLinde2002)aresometimesused

24emsp|emspSimulated example

In order to illustrate the effectiveness of the preferential samplingmethodandtoemphasizethemisleadingresultswewouldobtainifwedonottakeitintoconsiderationweinitiallyconsiderasimulationstudy(Figure1)OnehundredrealizationsofaGaussianspatialrandomfieldwithMateacuterncovariance functionweregeneratedovera100‐by‐100grid using theRandomFields package (SchlatherMalinowskiMenckOestingampStrokorb2015)

Foreachofthe100simulatedGaussianspatialrandomfieldsthatrepresentthedistributionofaspecies in thestudyarea twosetsof100sampleswerereproducedonedistributedpreferentiallyandonedistributed randomly In particular preferentially locations were se‐lectedusingrelativeprobabilitiesproportionaltotheintensityfunctionΛsAbundanceestimateswereextractedattheselectedlocationsasasimulationoftheobservationsforthisexperiment

Both preferential and non‐preferential models were fitted toeachof thesamplesetsandcompared in theirperformanceusing

(4)

YssimGa (s)

log (s)=0+Ws

WsimN (0Q ())

(2 log log )simMN(ww)

(5)

YssimGa(s)

log(s)=0+Zs

ZsimN(0Q())

(2 log log )simMN(zz)

F I G U R E 1 emspRepresentationofoneoftheonehundredGaussianfieldsimulatedandtherespectivepreferentiallysamplinglocationsgenerated

08

08

06

06

04

04

02

02

000000

000002

000004

000006

000008

000010

000012

000014

000016

emspensp emsp | emsp657PENNINO Et al

three different criteria the deviance information criterion (DIC)(Spiegelhalteretal2002)the log‐conditionalpredictiveordinates(LCPO)(RoosampHeld2011)andthepredictivemeanabsoluteerror(MAE)(WillmottampMatsuura2005)SpecificallytheDICmeasuresthecompromisebetween the fit and theparsimonyof themodeltheLCPOisaldquoleaveoneoutrdquocross‐validationindextoassessthepre‐dictivepowerof themodelandtheMAE indicates thepredictionerror Lower valuesofDIC LCPOandMAE suggestbettermodelperformance Finally a sensitivity analysiswithdifferent pcpriorswas carriedout to assess their influenceon the final inferenceoftherangeandvarianceofasimulatedGaussianspatialrandomfield

25emsp|emspSpatial distribution of blue and red shrimp in the Western Mediterranean Sea

In this section we illustrate how preferential sampling can beaccounted for in theconcretedataexample fromthecontextoffisherydataFishery‐dependentdataderivedfromopportunisticsamplingonboats fromthecommercial fleetpresentastandardexample of preferential sampling since clearly fishers preferen‐tiallyfishinareaswheretheyexpecttofindlargeamountsoftheirtargetspecies

Weconsiderdataonblueand redshrimp (Aristeus antennatus Risso 1816)(Carbonelletal2017DevalampKapiris2016Lleonart2005)oneofthemosteconomicallyimportantdeep‐seatrawlfish‐eryintheWesternMediterraneanSeaThedatawerecollectedbyobserversonboardanumberoffishingboatsintheGulfofAlicante(Spain)from2009to2012Thedatasetincludes77haulsfromninedifferenttrawlingvessels(Figure2)andwasprovidedbytheInstituto Espantildeol de Oceanografiacutea(IEOSpanishOceanographicInstitute)

Asmentionedthefittedeffectsfortheabundanceofblueandredshrimp(Y)arecorrectedbyjointlyfittingamodelfortheabun‐dancesandthesampling(fishing)locationsreflectingfishersrsquo(poten‐tially)impreciseknowledgeofthedistributionofblueandredshrimpWealsoassumethattheobservedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp

0+df(d)+wWs

wheref(d) represents thenotnecessarily linear relationshipwithoneco‐variate inthiscasebathymetryThereasonunderneaththisselec‐tionwasthatexploratoryanalysisrevealednon‐linearrelationshipsbetweendepthandblueandredshrimpabundanceTheremainingsecondpartofthemodelthatistheoneexplainingtheabundancealsocontainstherelationshipwiththebathymetry

where s indexes the locationofeachhauland j indexesdifferentdepths(djrepresentingthedifferentvaluesofbathymetryobservedinthestudyareafromd1=90mtpdm =40=920m)

WeuseaBayesiansmoothingsplineFahrmeirandLang(2001)tomodelnon‐lineareffectsofdepthusingasecond‐orderrandomwalk(RW2)latentmodel

Asnopriorinformationontheparametersofthemodelwasavail‐ablewe used a vague zero‐meanGaussian prior distributionwith avarianceof100forthefixedeffectsRegardingthespatialeffectpri‐orsonμκandμτwereselectedsothatthemedianpriorrangeofthiscomponentwashalfof thestudyareaand itspriormedianstandarddeviationwas1Itwasonlyinthecaseofthebathymetrythatavisualpre‐selectionofpriorswasmadetoavoidoverfittingbychangingtheprioroftheprecisionparameterwhilethemodelswerescaledtohaveageneralizedvarianceequalto1(SoslashrbyeampRue2014)Inanycaseallresultingposteriordistributionsconcentratedwellwithinthesupportofthepriorsselected

Notethateachpredictorhasitsownintercept(0prime0)butbathy‐metricf(d)andspatialeffectsWsaresharedinbothpredictorsAlsoboththebathymetricandthespatialeffectsarescaledbyαdandαwrespectivelytoallowforthedifferencesinscalebetweenblueandredshrimpabundancesandtheLGCPintensities

Itisalsoworthnotingthatρisaparameteroftheentiremodelthat reflects the global variability of the response variable As al‐readymentionedbeforeinapreferentialsamplingsituationobser‐vationsaremorefrequentwhereabundancevaluesarelikelytobehigherandthisfactcouldaffectthe inferenceabouttheρparam‐eter In linewith this apossibleextension toour globalmodelingapproachcouldbeamodelthatcouldtakeintoaccountthistypeofvariabilitybylinkingρtotheabundancemeasurements

ModelcomparisonwasperformedusingtheDICandLCPOcri‐teriaFinallyinordertotestthepredictionperformanceofthefinalpreferentialandnon‐preferentialmodelswecalculatedthePearsoncorrelation (r) index between the predicted abundance estimatesand an external database of observed abundance values in thesametimeperiod (2009ndash2012)This independentdataset includes

(6)

YssimGa(s)

log(s)=0+ f(d)+Ws

WsimN(0Q())

(2 log log )simMN(ww)

Δdj=djminusdj+1simN(0d) j=1hellip m

dsimLogGamma(400001)

F I G U R E 2 emspStudyareaandsamplinglocations(hauls)ofblueandredshrimp(Aristeus antennatus)Thesizeofthedotsrepresentstheamountcaughtineachofthelocations

50

100

200

200

200

400 400

700

700

700

700

1000

1500

1500

2000

2000

200

0

200

0

658emsp |emsp emspensp PENNINO Et al

fishery‐independentdatacollectedduringtheMEDITS(EU‐fundedMEDIterranean Trawl Survey) The MEDITS is carried out fromspringtoearlysummer(ApriltoJune)everyyearintheareaanditusesarandomsampling(Penninoetal2016)

3emsp |emspRESULTS

31emsp|emspSimulated example

Resultsobtainedinthissimulationstudyshowthatnottakingintoconsiderationpreferentialsamplingleadstomisleadingresultsspe‐ciallyatlow‐abundanceareas

Figure3showsthedifferenceinDICLCPOandMAEscoresbetweenpreferentialandnon‐preferentialmodels forbothsam‐ples the ones distributed preferentially and the one distributedrandomlyInparticularmorethan75oftheDICLCPOandMAEvaluesobtainedinthe100simulatedGaussianfieldsusingprefer‐entialmodelswere lowerthantheonesobtainedwithnon‐pref‐erentialmodels

In addition as it can be appreciated in the example shown inFigure4whichpresentstheresultsofoneoftheonehundredsim‐ulationsforexplanatorypurposesevenifnoneofthemodelswas

able tomake optimal predictions at low abundance locations thenon‐preferentialmodelperformedsignificantlyworse

Similarly Figure5which shows the posterior predictivemeanofoneofthesimulatedabundanceprocesseswithoutandwiththepreferentialsamplingcorrection((a)and(b)respectively))illustratesthatalthoughbothmodelshavesimilarpredictivespatialpatternsthepreferentiallycorrectedmodelpredictsbetteratmoderate‐to‐lowabundanceareas

FinallyFigure6showshowalldifferentpriorsofthehyperpa‐rameters of the spatial field end up fitting posterior distributionsthatareallconcentratedwithintherealvaluesResultsclearlyindi‐catethatslightchangesinthepriorsarenotanissuewhendealingwiththepreferentialsamplingmodels

32emsp|emspDistribution of blue and red shrimp in the Western Mediterranean Sea

All possiblemodels derived from 6were run Among them themost relevant results are presented in Table1While analyzingthedataweobservedthatboththebathymetricandthespatialtermsof the LGCPaccounted for approximately the same infor‐mation As a consequence full models did not converge in the

F I G U R E 3 emsp Improvementofthepreferentialmodelagainstaconventionalmodelinmodelfitscores(DICLCPOandMAE)Comparisonisbasedon100preferentiallyandrandomlysampleddatasetsNotethatpositivevaluesrepresentanimprovementonmodelfitandviceversa

minus40

minus20

0

Preferential data Random data

Type of data

Impr

ovem

ent i

n ab

unda

nce

mod

elDIC

minus03

minus02

minus01

00

01

Preferential data Random data

Type of data

LCPO

minus25

minus20

minus15

minus10

minus05

00

Preferential data Random data

Type of data

MAE

emspensp emsp | emsp659PENNINO Et al

point‐patternprocesswhichrestrictedthemodelcomparison inTable1tocorrectingonlyoneoftheeffectseitherthebathymet‐ricorthespatialeffect

Thebestmodel (basedon theDICandLCPO)was theprefer‐ential one with an shared spatial effect (ieModel 5 in Table1)ThesecondmostrelevantmodelintermisDICandLCPOwasthe

F I G U R E 4 emspSimulatedabundanceagainstpredictedabundanceinthenon‐preferentialmodel(left)andinthemodelwiththepreferentialcorrection(right)foroneoftheonehundredsimulationsperformedThenon‐preferentialmodelpredictsworsethanthepreferentialmodelatlow‐abundanceareas

0 1 2 3 4 5

02

46

810

Simulated field

Non

minuspr

efer

entia

l pre

dict

ion

0 1 2 3 4 50

24

68

10

Simulated field

Pre

fere

ntia

l pre

dict

ion

F I G U R E 5 emspPosteriorpredictivemeanmapsofoneoftheonehundredsimulatedabundanceprocesseswithout(left)andwith(right)thepreferentialsamplingcorrection

20

25

30

35

40

45

50

55

20

25

30

35

40

45

50

55

60

(a) (b)

660emsp |emsp emspensp PENNINO Et al

preferential one that included in addition to the spatial effect asharedbathymetriceffect(ieModel8inTable1)

Figure7showsthemeanoftheposteriordistributionofthespa‐tialeffectinthemodelwithoutandwithpreferentialsamplingwhileFigure8illustratestheposteriorpredictivemeanoftheblueandredshrimp distribution without and with the preferential correctionBothfiguresshowasimilarpatternIndeeditisclearthatthespatialoutputsobtainedwiththepreferentialmodelbetterabsorbthevari‐abilityofthespecieshabitatprovidingamorenaturalpatternoftheblueandredshrimpdistribution

Finally formodel validation the final preferentialmodel ob‐tained a reasonably high value for Pearsons r (r=047) in thecross‐validationwiththeMEDITSdatasetwithrespecttothenon‐preferential one (r=022) It isworthmentioning thatMEDITSsareperformedonlyinspringsummerandfewsamplingsarecar‐riedoutinthestudyareaThesefindingsfurtherhighlighthowthe

correctionofthepreferentialmodelisimportanttoreflecttherealdistributionofaspecies

4emsp |emspDISCUSSION

Inthispaperwepresentedamodelingapproachthatcouldbeveryusefulformodelingthedistributionofspeciesusingopportunisticdataandacquiringin‐depthknowledgethatcouldbeessentialforthecorrectmanagementofnaturalresourcesSpatialecologyhasadirectappliedrelevancetonaturalresourcemanagementbutitalsohasabroadecologicalsignificanceAlthoughitmaybecompli‐catedtodefinetheboundariesofspecieshabitatscombinedwithanefficientmanagementthatrecognizestheimportanceofsuchareasthisrepresentsthefirststeptowardfacilitatinganeffectivespatialmanagement However as shown by our results using anon‐accurate approach could culminate inmisidentification of aspecieshabitatanduncertainpredictionsandsoininappropriatemanagementmeasureswhichcansometimesbeirreversible

Theresultsofthepracticalapplicationonblueandredshrimpincludedhereasarealworldscenarioshowthatpredictivemapssignificantly improve the prediction of the target specieswhenthemodelaccountsforpreferentialsamplingIndeeditisknownthatthesuitablebathymetricrangeoftheblueandredshrimpintheMediterraneanseaisbetween200and200meters(GuijarroMassutiacuteMorantaampDiacuteaz2008)thatisreflectedbythepreferen‐tialmodelpredictionmapIncontrastthenon‐preferentialmodelpredictionwasnotabletocapturetherealdistributionofthespe‐ciesInadditionifamanagementmeasuresuchascreationofamarineconservationareashouldbeappliedonthebasisof thenon‐preferentialmodel it isclearthat itwouldnotbeappropri‐ateThiscouldresultinextremelylargeareabeingrecommended

TA B L E 1 emspModelcomparisonfortheabundanceoftheblueandredshrimp(Aristeus antennatus)basedonDICLCPOandcomputationaltimes

Model DIC LCPO Times (s)

1 Intc+Depth +19 +005 3

2 Intc+Spatial +9 minus 24

3 Intc+Depth+Spatial +11 +001 57

4 Intc+Depth +52 +024 21

5 Intc+Spatial minus minus 171

6 Intc+Spatial+Depth +5 +002 212

7 Intc+Depth+Spatial +10 +001 2275

8 Intc+Depth+Spatial +3 +003 3470

NotesDICandLCPOscoresarepresentedasdeviationsfromthebestmodelIntcInterceptBoldtermssharedcomponents

F I G U R E 6 emspSensitivityanalysisofthepcpriordistributionsfortherangeandvarianceofasimulatedspatialfieldDashedlinesrepresentpriordistributionssolidlinesposteriordistributionsandverticallinestherealvaluesofeachofthehyperparametersofthespatialfieldrangeintheleftpanelandvarianceintherightpanelRangepriorsweresetsothattheprobabilityofhavingarangesmallerthan10203040and50ofthemaximumdistanceofthestudyareawas025Similarlypriorsoverthevarianceofthespatialfieldweresetsothattheprobabilityofhavingavariancehigherthan2345and6was01

05 10 15

00

05

10

15

20

25

Matern range

x

y

PosteriorPriorReal

PosteriorPriorReal

15 10 5

000

005

010

015

020

025

030

Matern variance

x

y

(a) (b)

emspensp emsp | emsp661PENNINO Et al

for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies

Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)

Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults

Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies

AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)

This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing

F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations

minus04

minus03

minus02

minus01

00

01

02

03

04

05

minus3

minus2

minus1

0

1

2

3

4

5

6

(a) (b)

F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations

25

30

35

40

45

minus4

minus3

minus2

minus1

0

1

2

3

4

5

(a) (b)

662emsp |emsp emspensp PENNINO Et al

parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence

Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges

ACKNOWLEDG MENTS

D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare

CONFLIC T OF INTERE S T

NoneDeclared

AUTHOR CONTRIBUTIONS

AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata

DATA ACCE SSIBILIT Y

Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit

R E FE R E N C E S

BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC

Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x

CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453

ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803

DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A

Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x

Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress

DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x

Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229

FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6

FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8

GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799

GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003

Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184

Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017

IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons

KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic

emspensp emsp | emsp663PENNINO Et al

observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x

LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609

Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x

LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64

Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281

PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062

ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods

RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x

RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609

Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x

RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223

SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25

Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064

SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004

SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353

Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2

VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska

WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079

How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789

654emsp |emsp emspensp PENNINO Et al

1emsp |emspINTRODUC TION

An increasing interest in Species distribution models (SDMs) formanagementandconservationpurposeshasdrawntheattentionofecologiststospatialmodels(Dormannetal2007)SDMsarerele‐vantintheoreticalandpracticalcontextswherethereisaninterestinforexampleassessingtherelationshipbetweenspeciesandtheirenvironmentidentifyingandmanagingprotectedareasandpredict‐ingaspeciesrsquoresponsetoecologicalchanges(LatimerWuGelfandampSilander2006)Inallthesecontextsthemainissueistolinkinfor‐mationontheabundancepresenceabsenceorpresenceonlyofaspeciestoenvironmentalvariablestopredictwhere(andhowmuchof)aspeciesislikelytobepresentinunsampledlocationselsewhereinspace

Instudiesofspeciesdistributioncollectingdataonthespeciesofinterestisnotatrivialproblem([Keryetal2010)Withtheexcep‐tionofafewstudies(ThogmartinKnutsonampSauer2006)SDMsfrequentlyrelyonopportunisticdatacollectionduetothehighcostandtimeconsumingnatureofsamplingdatainthefieldespeciallyona largespatialscale(Keryetal2010) Indeed it isofteninfea‐sibletocollectdatabasedonawell‐designedrandomizedandorsystematicsamplingschemetoestimatethedistributionofaspecificspeciesover theentireareaof interest (BrotonsHerrandoampPla2007)HencevarioustypesofopportunisticsamplingschemesarecommonlyusedAsanexamplestudiesonseamammalscommonlyresorttotheaffordablepracticeofsamplingfromrecreationalboats(so‐called platforms of opportunity) whose bearings are neitherrandomnorsystematic(RodriacuteguezBrotonsBustamanteampSeoane2007)SimilarlybirddataareoftenderivedfromonlinedatabasessuchaseBirdwhichmakeavailablelocationsofbirdssightedbybird‐watcherswhotendtovisithabitatssuitableforinterestingspecies(httpsebirdorg)Alsointhecontextoffisheryecologyfishery‐dependent survey data are often derived from commercial fleetstendtobereadilyavailableforanalysisHoweverthefishingboatsnaturallytendtofishinlocationswheretheyexpectahighconcen‐trationoftheirtargetspecies(VasconcellosampCochrane2005)

AllthesetypesofopportunisticallycollecteddatatendtosufferfromaspecificcomplicationThesamplingschemethatdeterminessamplinglocationsisnotrandomandhencenotindependentoftheresponsevariableofinterestforexamplespeciesabundance(ConnThorsonampJohnson2017DiggleMenezesampSu2010)HoweverSDMstypicallyassumeifonlyimplicitlythatsamplinglocationsarenot informativeand that theyhavebeenchosen independentlyofwhatvaluesareexpectedtobeobservedinaspecificlocationThisassumption is typically violated for opportunistic data resulting inpreferentiallysampleddatacollectedinlocationsthatweredeliber‐atelychoseninareaswheretheabundanceofthespeciesofinterest

isthoughttobeparticularlyhighor lowThisviolationleadstobi‐asedestimatesandpredictions(Diggleetal2010)

Consequentlybiasedestimationandpredictionsofspeciesdis‐tributionleadtobadlyinformeddecisionmakingandtoinefficientorinappropriatemanagementofnaturalresources(Connetal2017Diggle etal 2010DinsdaleampSalibian‐Barrera 2018) This paperseeks to address preferential sampling in the context of fisheriesecologywherethis issue isparticularlyrelevantsincethe identifi‐cationandmanagementof sensitivehabitats (eg throughmarineprotected areas nurseries high‐discard locations) is a commonconservationtoolusedtosustainthelong‐termviabilityofspeciespopulations

Diggleetal (2010)suggestamodelingapproachthataccountsfor preferential sampling using likelihood‐based inference withMonteCarlomethodsHowevertheresultingapproachcanbecom‐putationallyintensiveastheauthorsrecognizeintheirreplytotheissuesraisedonthediscussionoftheirpaperwhichimpliesthatitisquitedifficult touse inpractical situations especiallywhen theobjectiveof theanalysis is topredict into inextendedareasThisisofsignificantconcernaspredictionisoftenthemainobjectiveofSDMsandpreferentialsamplingissuesarebytheirverydefinitionapracticalproblemthatneedstobeaddressedinaformthatmakesthemaccessibletousers

AsRueMartinoMondalandChopin(2010)indicateinthedis‐cussiononDiggleetalrsquospaperpreferentialsamplingmaybeseenasamarkedspatialpointprocessmodelinparticularamarkedlog‐GaussianCoxprocessThesemodelscanbefittedinacomputation‐ally efficient way using integrated nested Laplace approximation(INLA)andassociatedsoftwareRueMartinoandChopin(2009)inafastcomputationalway

In this studywe thoroughly explain themethodology for per‐formingpreferentialsamplingmodelsusingtheapproachproposedbyRueetal(2010)withinthecontextofSDMswiththefinalaimtoprovideguidanceontheappropriateuseandinterpretationofthefittedmodelsApracticalapplicationassessingthespatialdistribu‐tion of blue and red shrimp (Aristeus antennatus Risso 1816) fromfishingdataintheGulfofAlicante(Spain)isprovidedasatutorialwhilesimulateddataareusedtodemonstrateperformanceissuesofstandardmethodswhichdonotaccountforpreferentialsampling

2emsp |emspMATERIAL AND METHODS

Preferentiallycollecteddataconsistoftwopiecesofinformation(a)samplinglocationsand(b)measuredabundance(oroccurrence)oftargetspeciesintheselocationswheretheintensityofsamplinglo‐cationsispositivelyornegativelycorrelatedwithabundancethatis

K E Y W O R D S

BayesianmodellingintegratednestedLaplaceapproximationpointprocessesspeciesdistributionmodelsstochasticpartialdifferentialequation

emspensp emsp | emsp655PENNINO Et al

(a)and(b)arenotindependentPredictingthedistributionoftargetspeciesusingthistypeofdataimpliesthatthesamplingdistributionisnotuniformlyrandomasittendstohavemoreobservationswhereabundance ishigher and thusbasic statisticalmodelassumptionsareviolated

In order to overcome this problem Diggle etal (2010) pro‐posedto interpretthepreferentialsamplingprocessasamarked point processes modelThisapproachusestheinformationonsam‐plinglocationsandmodelsthemalongwiththemeasuredvaluesofthevariableofinterestinajointmodelInparticularsamplinglocationsareinterpretedasaspatial point patternaccountingfora higher point intensitywheremeasured values are higher Themeasurementstakenineachofthepoints(themarkinpointpro‐cessterminology)aremodeledalongwithandassumedtobepo‐tentiallydependentonthepoint‐patternintensityinajointmodelThisapproachaccountsforpreferentialsamplingwhilestillmak‐ingpredictionsforthevariableof interest inspace Inparticularinthefisheryexamplediscussedherefishinglocationsareinter‐pretedasapointpatternwhilethespeciescatchateachlocationisinterpretedasamark

21emsp|emspStatistical model

Formallyaspatialpointpatternconsistsofthespatial locationsofeventsorobjectsinadefinedstudyregionIllianPenttinenStoyanandStoyan (2008)Examples include locationsofspecies inapar‐ticularareaorparasitesinamicrobiologycultureSpatialpointpro‐cessesaremathematicalmodels(randomvariables)usedtodescribeandanalyze these spatialpatternsA simple theoreticalmodel foraspatialpointpattern is thePoissonprocessusuallydescribed intermsofitsintensityfunctionΛxThisintensityfunctionrepresentsthedistributionoflocations(ldquopointsrdquo)inspaceInaPoissonprocessthenumberofpointsfollowsaPoissondistributionandthelocationsofthesepointsindependentofanyoftheotherpointsThehomoge‐neousPoissonprocessrepresentscompletespatialrandomnessandservesasareferenceornullmodelinmanyapplications

Theintensityofapointprocessthatisthenumberofpointsperunitareamayeitherbeconstantoverspaceresultinginahomoge‐neousorstationarypatternorvaryinspacewithaspatialtrendre‐sultinginanon‐homogeneouspatternHowevertheassumptionofstationarityisgenerallyunrealisticinmostSDMapplicationsastheintensityfunctionvarieswiththeenvironmentmakingnon‐homo‐geneousPoissonprocessespotentiallyabetterchoice todescribespeciesdistributionbasedonatrendfunctionthatmaydependoncovariatesNonethelessinapplicationscovariatesmaynotexplaintheentirespatialstructureinaspatialpatternIncontrasttheclassofCoxprocessesprovidestheflexibilitytomodelaggregatedpointpatterns relative to observed and unobserved abiotic and bioticmechanismsHere spatial structures inanobservedpointpatternmayreflectdependenceonknownandmeasuredcovariatesaswellas on unknown or unmeasurable covariates or bioticmechanismsthat cannotbe readily representedby a covariate suchasdisper‐sallimitationIndeedthespatialstructureofbothabioticandbiotic

variablescan impactonecologicalprocessesandconsequentlybereflectedinthespeciesdistribution

Log‐GaussianCoxprocesses(LGCPs)areaspecificclassofCoxprocessmodelsinwhichthelogarithmoftheintensitysurfaceisaGaussianrandomfieldGiventherandomfieldMoreformally

where Vs is aGaussian random fieldGiven the random field theobservedlocationss = (s1hellipsn)areindependentandformaPoissonprocess

Inthecaseofpreferentiallysampledspeciestheobservedabun‐danceY = (y1hellipyn)isalsolinkedtotheintensityoftheunderlyingspatial field Inpracticeonlyvery fewsamplesmightbeavailableinsomeareasif it isassumedthatabundanceisparticularlylowinthese areas The LGCPmodel fitted to the sampling locations re‐flectsareaswithlowspeciesabundancethathaveresultedinareaswith fewer sampling locationsTo incorporate such information inthe SDM abundance model we apply joint modeling techniqueswhichallowfittingsharedmodelcomponentsinmodelswithtwoormorelinearpredictorsHereweconsidertwodependentpredictorswithtworesponsesthatistheobservedspeciesabundances(themarks)andtheintensityofthepointprocessreflectingsamplingin‐tensitythroughspace

This results in apreferential samplingmodel that consistsoftwo levelswhere information is sharedbetween the two levelsthemarkmodelandthepatternmodel Inparticular themarkY isassumedtofollowanexponentialfamilydistributionsuchasaGaussian lognormal or gamma distribution for continuous vari‐ablesoraPoissondistribution forcountdata Inall thesecasesthemeanμs isrelatedtothespatialtermthroughanappropriatelinkfunctionη

where prime0 istheinterceptofthemodelthecoefficientsprime

nquantify

theeffectofsomecovariatesXnontheresponseandWsisthespa‐tialeffectofthemodelthatistheGaussianrandomfieldTheco‐variatesintheadditivepredictorareenvironmentalfeatureslinkedtohabitatpreferencesofaspecies

InthesecondpartofthemodelanLGCPmodelwithintensityfunctionΛsreflectingthesamplinglocations

where β0istheinterceptoftheLGCPthecoefficientsβnquantifytheeffectofsomecovariatesXnontheintensityfunctionandWsisthespatialtermsharedwiththeLGCPbutscaledbyαtoallowforthedif‐ferencesinscalebetweenthemarkvaluesandtheLGCPintensities

Bayesianinferenceturnsouttobeagoodoptiontofitspatialhierarchicalmodelsbecauseitallowsboththeobserveddataandmodel parameters to be random variables ([Banerjee Carlin ampGelfand2004)resultinginamorerealisticandaccurateestima‐tionofuncertaintyAnimportantissueoftheBayesianapproachisthatpriordistributionsmustbeassignedtotheparameters(inourcaseβ0and

prime

0)andhyperparameters(inourcasethoseofthe

(1)log(Λs)=Vs

(2)(s)=0+

nXn+Ws

(3)Λs=exp

0+nXn+Ws

656emsp |emsp emspensp PENNINO Et al

spatialeffectW )involvedinthemodelsNeverthelessasusualinthiskindofmodels theresultingposteriordistributionsarenotanalyticallyknownandsonumericalapproachesareneeded

22emsp|emspFitting models with INLA

Model‐fitting methods based on Markov chain Monte Carlo(MCMC)canbevery time‐consuming for spatialmodels inpar‐ticularLGCPsNeverthelessLGCPsareaspecialcaseofthemoregeneralclassof latentGaussianmodelswhichcanbedescribedas a subclass of structured additive regression (STAR) models(Fahrmeir amp Tutz 2001) In these models the mean of the re‐sponsevariableis linkedtoastructuredpredictorwhichcanbeexpressedintermsoflinearandnon‐lineareffectsofcovariatesInaBayesianframeworkbyassigningGaussianpriorstoallran‐domtermsinthepredictorweobtainalatentGaussianmodelAsaresultwecandirectlycomputeLGCPmodelsusingIntegratednested Laplace approximation (INLA) INLAprovides a fast yetaccurate approach to fitting latentGaussianmodels andmakesthe inclusion of covariates andmarked point processesmathe‐maticallytractablewithcomputationallyefficientinference(Illianetal2013SimpsonIllianLindgrenSoslashrbyeampRue2016)

23emsp|emspPreferential and non‐preferential models

Asmentionedabovetheresultingpreferentialmodelcanbeex‐pressedasatwo‐partmodelasfollowsAssumingthattheob‐servedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp

0+Ws

wehavetoassignadistributionfortheabundanceBasedonthefactthattheabundanceisusuallyapositiveoutcomewehave considereda gammadistributionalthough clearly other options could be possible (exponentiallognormalorPoissonforcounts)Thisyields

wheretheGaussianrandomfieldWslinkstheLGCPandtheabundanceprocessscaledbyα inthePoissonprocesspredictortoallowfordif‐ferencesinscaleThematrixQ() isestimatedimplicitlythroughanapproximation of the Gaussian random field through the stochasticpartialdifferentialequation(SPDE)approachasinLindgrenRueandLindstroumlm(2011)andSimpsonetal(2016)

Itisworthnotingthatnottakingaccountofpreferentialsamplingleads to biased results This can be easily seen by comparing thepreferentialsamplingapproachwiththefollowingsimplermodel

Notethatthemodel inEquation5assumesthat for therandomfieldswehaveZneWwhereasthepreferentialsamplingmodelassumesasinglesharedrandomfieldforboththepointpatternandthemarkForbothmodelspriordistributionshavetobechosenfortoallthepa‐rametersandhyperparametersWehaveassignedvaguepriorsthatisusedthedefaultinR‐INLAduetoagenerallackofpriorinformationAnotherapproachthatcouldbeusedhereispenalizedcomplexitypri‐ors (hereafterpcpriors) asdescribed inFuglstadSimpson LindgrenandRue(2017)andreadilyavailableinR‐INLA

Asmentionedabovebothmodelsin4and5mayincludecovari‐atesthesignificanceofwhichmaybetestedthroughmodelselectionproceduresThereisverylittleliteratureavailableonmodelselectionforpointprocessmodelshowevercriterialikethedevianceinforma‐tioncriterion(DIC)(SpiegelhalterBestCarlinmampVanDerLinde2002)aresometimesused

24emsp|emspSimulated example

In order to illustrate the effectiveness of the preferential samplingmethodandtoemphasizethemisleadingresultswewouldobtainifwedonottakeitintoconsiderationweinitiallyconsiderasimulationstudy(Figure1)OnehundredrealizationsofaGaussianspatialrandomfieldwithMateacuterncovariance functionweregeneratedovera100‐by‐100grid using theRandomFields package (SchlatherMalinowskiMenckOestingampStrokorb2015)

Foreachofthe100simulatedGaussianspatialrandomfieldsthatrepresentthedistributionofaspecies in thestudyarea twosetsof100sampleswerereproducedonedistributedpreferentiallyandonedistributed randomly In particular preferentially locations were se‐lectedusingrelativeprobabilitiesproportionaltotheintensityfunctionΛsAbundanceestimateswereextractedattheselectedlocationsasasimulationoftheobservationsforthisexperiment

Both preferential and non‐preferential models were fitted toeachof thesamplesetsandcompared in theirperformanceusing

(4)

YssimGa (s)

log (s)=0+Ws

WsimN (0Q ())

(2 log log )simMN(ww)

(5)

YssimGa(s)

log(s)=0+Zs

ZsimN(0Q())

(2 log log )simMN(zz)

F I G U R E 1 emspRepresentationofoneoftheonehundredGaussianfieldsimulatedandtherespectivepreferentiallysamplinglocationsgenerated

08

08

06

06

04

04

02

02

000000

000002

000004

000006

000008

000010

000012

000014

000016

emspensp emsp | emsp657PENNINO Et al

three different criteria the deviance information criterion (DIC)(Spiegelhalteretal2002)the log‐conditionalpredictiveordinates(LCPO)(RoosampHeld2011)andthepredictivemeanabsoluteerror(MAE)(WillmottampMatsuura2005)SpecificallytheDICmeasuresthecompromisebetween the fit and theparsimonyof themodeltheLCPOisaldquoleaveoneoutrdquocross‐validationindextoassessthepre‐dictivepowerof themodelandtheMAE indicates thepredictionerror Lower valuesofDIC LCPOandMAE suggestbettermodelperformance Finally a sensitivity analysiswithdifferent pcpriorswas carriedout to assess their influenceon the final inferenceoftherangeandvarianceofasimulatedGaussianspatialrandomfield

25emsp|emspSpatial distribution of blue and red shrimp in the Western Mediterranean Sea

In this section we illustrate how preferential sampling can beaccounted for in theconcretedataexample fromthecontextoffisherydataFishery‐dependentdataderivedfromopportunisticsamplingonboats fromthecommercial fleetpresentastandardexample of preferential sampling since clearly fishers preferen‐tiallyfishinareaswheretheyexpecttofindlargeamountsoftheirtargetspecies

Weconsiderdataonblueand redshrimp (Aristeus antennatus Risso 1816)(Carbonelletal2017DevalampKapiris2016Lleonart2005)oneofthemosteconomicallyimportantdeep‐seatrawlfish‐eryintheWesternMediterraneanSeaThedatawerecollectedbyobserversonboardanumberoffishingboatsintheGulfofAlicante(Spain)from2009to2012Thedatasetincludes77haulsfromninedifferenttrawlingvessels(Figure2)andwasprovidedbytheInstituto Espantildeol de Oceanografiacutea(IEOSpanishOceanographicInstitute)

Asmentionedthefittedeffectsfortheabundanceofblueandredshrimp(Y)arecorrectedbyjointlyfittingamodelfortheabun‐dancesandthesampling(fishing)locationsreflectingfishersrsquo(poten‐tially)impreciseknowledgeofthedistributionofblueandredshrimpWealsoassumethattheobservedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp

0+df(d)+wWs

wheref(d) represents thenotnecessarily linear relationshipwithoneco‐variate inthiscasebathymetryThereasonunderneaththisselec‐tionwasthatexploratoryanalysisrevealednon‐linearrelationshipsbetweendepthandblueandredshrimpabundanceTheremainingsecondpartofthemodelthatistheoneexplainingtheabundancealsocontainstherelationshipwiththebathymetry

where s indexes the locationofeachhauland j indexesdifferentdepths(djrepresentingthedifferentvaluesofbathymetryobservedinthestudyareafromd1=90mtpdm =40=920m)

WeuseaBayesiansmoothingsplineFahrmeirandLang(2001)tomodelnon‐lineareffectsofdepthusingasecond‐orderrandomwalk(RW2)latentmodel

Asnopriorinformationontheparametersofthemodelwasavail‐ablewe used a vague zero‐meanGaussian prior distributionwith avarianceof100forthefixedeffectsRegardingthespatialeffectpri‐orsonμκandμτwereselectedsothatthemedianpriorrangeofthiscomponentwashalfof thestudyareaand itspriormedianstandarddeviationwas1Itwasonlyinthecaseofthebathymetrythatavisualpre‐selectionofpriorswasmadetoavoidoverfittingbychangingtheprioroftheprecisionparameterwhilethemodelswerescaledtohaveageneralizedvarianceequalto1(SoslashrbyeampRue2014)Inanycaseallresultingposteriordistributionsconcentratedwellwithinthesupportofthepriorsselected

Notethateachpredictorhasitsownintercept(0prime0)butbathy‐metricf(d)andspatialeffectsWsaresharedinbothpredictorsAlsoboththebathymetricandthespatialeffectsarescaledbyαdandαwrespectivelytoallowforthedifferencesinscalebetweenblueandredshrimpabundancesandtheLGCPintensities

Itisalsoworthnotingthatρisaparameteroftheentiremodelthat reflects the global variability of the response variable As al‐readymentionedbeforeinapreferentialsamplingsituationobser‐vationsaremorefrequentwhereabundancevaluesarelikelytobehigherandthisfactcouldaffectthe inferenceabouttheρparam‐eter In linewith this apossibleextension toour globalmodelingapproachcouldbeamodelthatcouldtakeintoaccountthistypeofvariabilitybylinkingρtotheabundancemeasurements

ModelcomparisonwasperformedusingtheDICandLCPOcri‐teriaFinallyinordertotestthepredictionperformanceofthefinalpreferentialandnon‐preferentialmodelswecalculatedthePearsoncorrelation (r) index between the predicted abundance estimatesand an external database of observed abundance values in thesametimeperiod (2009ndash2012)This independentdataset includes

(6)

YssimGa(s)

log(s)=0+ f(d)+Ws

WsimN(0Q())

(2 log log )simMN(ww)

Δdj=djminusdj+1simN(0d) j=1hellip m

dsimLogGamma(400001)

F I G U R E 2 emspStudyareaandsamplinglocations(hauls)ofblueandredshrimp(Aristeus antennatus)Thesizeofthedotsrepresentstheamountcaughtineachofthelocations

50

100

200

200

200

400 400

700

700

700

700

1000

1500

1500

2000

2000

200

0

200

0

658emsp |emsp emspensp PENNINO Et al

fishery‐independentdatacollectedduringtheMEDITS(EU‐fundedMEDIterranean Trawl Survey) The MEDITS is carried out fromspringtoearlysummer(ApriltoJune)everyyearintheareaanditusesarandomsampling(Penninoetal2016)

3emsp |emspRESULTS

31emsp|emspSimulated example

Resultsobtainedinthissimulationstudyshowthatnottakingintoconsiderationpreferentialsamplingleadstomisleadingresultsspe‐ciallyatlow‐abundanceareas

Figure3showsthedifferenceinDICLCPOandMAEscoresbetweenpreferentialandnon‐preferentialmodels forbothsam‐ples the ones distributed preferentially and the one distributedrandomlyInparticularmorethan75oftheDICLCPOandMAEvaluesobtainedinthe100simulatedGaussianfieldsusingprefer‐entialmodelswere lowerthantheonesobtainedwithnon‐pref‐erentialmodels

In addition as it can be appreciated in the example shown inFigure4whichpresentstheresultsofoneoftheonehundredsim‐ulationsforexplanatorypurposesevenifnoneofthemodelswas

able tomake optimal predictions at low abundance locations thenon‐preferentialmodelperformedsignificantlyworse

Similarly Figure5which shows the posterior predictivemeanofoneofthesimulatedabundanceprocesseswithoutandwiththepreferentialsamplingcorrection((a)and(b)respectively))illustratesthatalthoughbothmodelshavesimilarpredictivespatialpatternsthepreferentiallycorrectedmodelpredictsbetteratmoderate‐to‐lowabundanceareas

FinallyFigure6showshowalldifferentpriorsofthehyperpa‐rameters of the spatial field end up fitting posterior distributionsthatareallconcentratedwithintherealvaluesResultsclearlyindi‐catethatslightchangesinthepriorsarenotanissuewhendealingwiththepreferentialsamplingmodels

32emsp|emspDistribution of blue and red shrimp in the Western Mediterranean Sea

All possiblemodels derived from 6were run Among them themost relevant results are presented in Table1While analyzingthedataweobservedthatboththebathymetricandthespatialtermsof the LGCPaccounted for approximately the same infor‐mation As a consequence full models did not converge in the

F I G U R E 3 emsp Improvementofthepreferentialmodelagainstaconventionalmodelinmodelfitscores(DICLCPOandMAE)Comparisonisbasedon100preferentiallyandrandomlysampleddatasetsNotethatpositivevaluesrepresentanimprovementonmodelfitandviceversa

minus40

minus20

0

Preferential data Random data

Type of data

Impr

ovem

ent i

n ab

unda

nce

mod

elDIC

minus03

minus02

minus01

00

01

Preferential data Random data

Type of data

LCPO

minus25

minus20

minus15

minus10

minus05

00

Preferential data Random data

Type of data

MAE

emspensp emsp | emsp659PENNINO Et al

point‐patternprocesswhichrestrictedthemodelcomparison inTable1tocorrectingonlyoneoftheeffectseitherthebathymet‐ricorthespatialeffect

Thebestmodel (basedon theDICandLCPO)was theprefer‐ential one with an shared spatial effect (ieModel 5 in Table1)ThesecondmostrelevantmodelintermisDICandLCPOwasthe

F I G U R E 4 emspSimulatedabundanceagainstpredictedabundanceinthenon‐preferentialmodel(left)andinthemodelwiththepreferentialcorrection(right)foroneoftheonehundredsimulationsperformedThenon‐preferentialmodelpredictsworsethanthepreferentialmodelatlow‐abundanceareas

0 1 2 3 4 5

02

46

810

Simulated field

Non

minuspr

efer

entia

l pre

dict

ion

0 1 2 3 4 50

24

68

10

Simulated field

Pre

fere

ntia

l pre

dict

ion

F I G U R E 5 emspPosteriorpredictivemeanmapsofoneoftheonehundredsimulatedabundanceprocesseswithout(left)andwith(right)thepreferentialsamplingcorrection

20

25

30

35

40

45

50

55

20

25

30

35

40

45

50

55

60

(a) (b)

660emsp |emsp emspensp PENNINO Et al

preferential one that included in addition to the spatial effect asharedbathymetriceffect(ieModel8inTable1)

Figure7showsthemeanoftheposteriordistributionofthespa‐tialeffectinthemodelwithoutandwithpreferentialsamplingwhileFigure8illustratestheposteriorpredictivemeanoftheblueandredshrimp distribution without and with the preferential correctionBothfiguresshowasimilarpatternIndeeditisclearthatthespatialoutputsobtainedwiththepreferentialmodelbetterabsorbthevari‐abilityofthespecieshabitatprovidingamorenaturalpatternoftheblueandredshrimpdistribution

Finally formodel validation the final preferentialmodel ob‐tained a reasonably high value for Pearsons r (r=047) in thecross‐validationwiththeMEDITSdatasetwithrespecttothenon‐preferential one (r=022) It isworthmentioning thatMEDITSsareperformedonlyinspringsummerandfewsamplingsarecar‐riedoutinthestudyareaThesefindingsfurtherhighlighthowthe

correctionofthepreferentialmodelisimportanttoreflecttherealdistributionofaspecies

4emsp |emspDISCUSSION

Inthispaperwepresentedamodelingapproachthatcouldbeveryusefulformodelingthedistributionofspeciesusingopportunisticdataandacquiringin‐depthknowledgethatcouldbeessentialforthecorrectmanagementofnaturalresourcesSpatialecologyhasadirectappliedrelevancetonaturalresourcemanagementbutitalsohasabroadecologicalsignificanceAlthoughitmaybecompli‐catedtodefinetheboundariesofspecieshabitatscombinedwithanefficientmanagementthatrecognizestheimportanceofsuchareasthisrepresentsthefirststeptowardfacilitatinganeffectivespatialmanagement However as shown by our results using anon‐accurate approach could culminate inmisidentification of aspecieshabitatanduncertainpredictionsandsoininappropriatemanagementmeasureswhichcansometimesbeirreversible

Theresultsofthepracticalapplicationonblueandredshrimpincludedhereasarealworldscenarioshowthatpredictivemapssignificantly improve the prediction of the target specieswhenthemodelaccountsforpreferentialsamplingIndeeditisknownthatthesuitablebathymetricrangeoftheblueandredshrimpintheMediterraneanseaisbetween200and200meters(GuijarroMassutiacuteMorantaampDiacuteaz2008)thatisreflectedbythepreferen‐tialmodelpredictionmapIncontrastthenon‐preferentialmodelpredictionwasnotabletocapturetherealdistributionofthespe‐ciesInadditionifamanagementmeasuresuchascreationofamarineconservationareashouldbeappliedonthebasisof thenon‐preferentialmodel it isclearthat itwouldnotbeappropri‐ateThiscouldresultinextremelylargeareabeingrecommended

TA B L E 1 emspModelcomparisonfortheabundanceoftheblueandredshrimp(Aristeus antennatus)basedonDICLCPOandcomputationaltimes

Model DIC LCPO Times (s)

1 Intc+Depth +19 +005 3

2 Intc+Spatial +9 minus 24

3 Intc+Depth+Spatial +11 +001 57

4 Intc+Depth +52 +024 21

5 Intc+Spatial minus minus 171

6 Intc+Spatial+Depth +5 +002 212

7 Intc+Depth+Spatial +10 +001 2275

8 Intc+Depth+Spatial +3 +003 3470

NotesDICandLCPOscoresarepresentedasdeviationsfromthebestmodelIntcInterceptBoldtermssharedcomponents

F I G U R E 6 emspSensitivityanalysisofthepcpriordistributionsfortherangeandvarianceofasimulatedspatialfieldDashedlinesrepresentpriordistributionssolidlinesposteriordistributionsandverticallinestherealvaluesofeachofthehyperparametersofthespatialfieldrangeintheleftpanelandvarianceintherightpanelRangepriorsweresetsothattheprobabilityofhavingarangesmallerthan10203040and50ofthemaximumdistanceofthestudyareawas025Similarlypriorsoverthevarianceofthespatialfieldweresetsothattheprobabilityofhavingavariancehigherthan2345and6was01

05 10 15

00

05

10

15

20

25

Matern range

x

y

PosteriorPriorReal

PosteriorPriorReal

15 10 5

000

005

010

015

020

025

030

Matern variance

x

y

(a) (b)

emspensp emsp | emsp661PENNINO Et al

for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies

Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)

Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults

Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies

AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)

This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing

F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations

minus04

minus03

minus02

minus01

00

01

02

03

04

05

minus3

minus2

minus1

0

1

2

3

4

5

6

(a) (b)

F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations

25

30

35

40

45

minus4

minus3

minus2

minus1

0

1

2

3

4

5

(a) (b)

662emsp |emsp emspensp PENNINO Et al

parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence

Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges

ACKNOWLEDG MENTS

D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare

CONFLIC T OF INTERE S T

NoneDeclared

AUTHOR CONTRIBUTIONS

AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata

DATA ACCE SSIBILIT Y

Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit

R E FE R E N C E S

BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC

Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x

CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453

ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803

DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A

Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x

Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress

DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x

Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229

FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6

FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8

GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799

GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003

Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184

Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017

IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons

KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic

emspensp emsp | emsp663PENNINO Et al

observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x

LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609

Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x

LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64

Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281

PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062

ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods

RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x

RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609

Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x

RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223

SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25

Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064

SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004

SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353

Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2

VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska

WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079

How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789

emspensp emsp | emsp655PENNINO Et al

(a)and(b)arenotindependentPredictingthedistributionoftargetspeciesusingthistypeofdataimpliesthatthesamplingdistributionisnotuniformlyrandomasittendstohavemoreobservationswhereabundance ishigher and thusbasic statisticalmodelassumptionsareviolated

In order to overcome this problem Diggle etal (2010) pro‐posedto interpretthepreferentialsamplingprocessasamarked point processes modelThisapproachusestheinformationonsam‐plinglocationsandmodelsthemalongwiththemeasuredvaluesofthevariableofinterestinajointmodelInparticularsamplinglocationsareinterpretedasaspatial point patternaccountingfora higher point intensitywheremeasured values are higher Themeasurementstakenineachofthepoints(themarkinpointpro‐cessterminology)aremodeledalongwithandassumedtobepo‐tentiallydependentonthepoint‐patternintensityinajointmodelThisapproachaccountsforpreferentialsamplingwhilestillmak‐ingpredictionsforthevariableof interest inspace Inparticularinthefisheryexamplediscussedherefishinglocationsareinter‐pretedasapointpatternwhilethespeciescatchateachlocationisinterpretedasamark

21emsp|emspStatistical model

Formallyaspatialpointpatternconsistsofthespatial locationsofeventsorobjectsinadefinedstudyregionIllianPenttinenStoyanandStoyan (2008)Examples include locationsofspecies inapar‐ticularareaorparasitesinamicrobiologycultureSpatialpointpro‐cessesaremathematicalmodels(randomvariables)usedtodescribeandanalyze these spatialpatternsA simple theoreticalmodel foraspatialpointpattern is thePoissonprocessusuallydescribed intermsofitsintensityfunctionΛxThisintensityfunctionrepresentsthedistributionoflocations(ldquopointsrdquo)inspaceInaPoissonprocessthenumberofpointsfollowsaPoissondistributionandthelocationsofthesepointsindependentofanyoftheotherpointsThehomoge‐neousPoissonprocessrepresentscompletespatialrandomnessandservesasareferenceornullmodelinmanyapplications

Theintensityofapointprocessthatisthenumberofpointsperunitareamayeitherbeconstantoverspaceresultinginahomoge‐neousorstationarypatternorvaryinspacewithaspatialtrendre‐sultinginanon‐homogeneouspatternHowevertheassumptionofstationarityisgenerallyunrealisticinmostSDMapplicationsastheintensityfunctionvarieswiththeenvironmentmakingnon‐homo‐geneousPoissonprocessespotentiallyabetterchoice todescribespeciesdistributionbasedonatrendfunctionthatmaydependoncovariatesNonethelessinapplicationscovariatesmaynotexplaintheentirespatialstructureinaspatialpatternIncontrasttheclassofCoxprocessesprovidestheflexibilitytomodelaggregatedpointpatterns relative to observed and unobserved abiotic and bioticmechanismsHere spatial structures inanobservedpointpatternmayreflectdependenceonknownandmeasuredcovariatesaswellas on unknown or unmeasurable covariates or bioticmechanismsthat cannotbe readily representedby a covariate suchasdisper‐sallimitationIndeedthespatialstructureofbothabioticandbiotic

variablescan impactonecologicalprocessesandconsequentlybereflectedinthespeciesdistribution

Log‐GaussianCoxprocesses(LGCPs)areaspecificclassofCoxprocessmodelsinwhichthelogarithmoftheintensitysurfaceisaGaussianrandomfieldGiventherandomfieldMoreformally

where Vs is aGaussian random fieldGiven the random field theobservedlocationss = (s1hellipsn)areindependentandformaPoissonprocess

Inthecaseofpreferentiallysampledspeciestheobservedabun‐danceY = (y1hellipyn)isalsolinkedtotheintensityoftheunderlyingspatial field Inpracticeonlyvery fewsamplesmightbeavailableinsomeareasif it isassumedthatabundanceisparticularlylowinthese areas The LGCPmodel fitted to the sampling locations re‐flectsareaswithlowspeciesabundancethathaveresultedinareaswith fewer sampling locationsTo incorporate such information inthe SDM abundance model we apply joint modeling techniqueswhichallowfittingsharedmodelcomponentsinmodelswithtwoormorelinearpredictorsHereweconsidertwodependentpredictorswithtworesponsesthatistheobservedspeciesabundances(themarks)andtheintensityofthepointprocessreflectingsamplingin‐tensitythroughspace

This results in apreferential samplingmodel that consistsoftwo levelswhere information is sharedbetween the two levelsthemarkmodelandthepatternmodel Inparticular themarkY isassumedtofollowanexponentialfamilydistributionsuchasaGaussian lognormal or gamma distribution for continuous vari‐ablesoraPoissondistribution forcountdata Inall thesecasesthemeanμs isrelatedtothespatialtermthroughanappropriatelinkfunctionη

where prime0 istheinterceptofthemodelthecoefficientsprime

nquantify

theeffectofsomecovariatesXnontheresponseandWsisthespa‐tialeffectofthemodelthatistheGaussianrandomfieldTheco‐variatesintheadditivepredictorareenvironmentalfeatureslinkedtohabitatpreferencesofaspecies

InthesecondpartofthemodelanLGCPmodelwithintensityfunctionΛsreflectingthesamplinglocations

where β0istheinterceptoftheLGCPthecoefficientsβnquantifytheeffectofsomecovariatesXnontheintensityfunctionandWsisthespatialtermsharedwiththeLGCPbutscaledbyαtoallowforthedif‐ferencesinscalebetweenthemarkvaluesandtheLGCPintensities

Bayesianinferenceturnsouttobeagoodoptiontofitspatialhierarchicalmodelsbecauseitallowsboththeobserveddataandmodel parameters to be random variables ([Banerjee Carlin ampGelfand2004)resultinginamorerealisticandaccurateestima‐tionofuncertaintyAnimportantissueoftheBayesianapproachisthatpriordistributionsmustbeassignedtotheparameters(inourcaseβ0and

prime

0)andhyperparameters(inourcasethoseofthe

(1)log(Λs)=Vs

(2)(s)=0+

nXn+Ws

(3)Λs=exp

0+nXn+Ws

656emsp |emsp emspensp PENNINO Et al

spatialeffectW )involvedinthemodelsNeverthelessasusualinthiskindofmodels theresultingposteriordistributionsarenotanalyticallyknownandsonumericalapproachesareneeded

22emsp|emspFitting models with INLA

Model‐fitting methods based on Markov chain Monte Carlo(MCMC)canbevery time‐consuming for spatialmodels inpar‐ticularLGCPsNeverthelessLGCPsareaspecialcaseofthemoregeneralclassof latentGaussianmodelswhichcanbedescribedas a subclass of structured additive regression (STAR) models(Fahrmeir amp Tutz 2001) In these models the mean of the re‐sponsevariableis linkedtoastructuredpredictorwhichcanbeexpressedintermsoflinearandnon‐lineareffectsofcovariatesInaBayesianframeworkbyassigningGaussianpriorstoallran‐domtermsinthepredictorweobtainalatentGaussianmodelAsaresultwecandirectlycomputeLGCPmodelsusingIntegratednested Laplace approximation (INLA) INLAprovides a fast yetaccurate approach to fitting latentGaussianmodels andmakesthe inclusion of covariates andmarked point processesmathe‐maticallytractablewithcomputationallyefficientinference(Illianetal2013SimpsonIllianLindgrenSoslashrbyeampRue2016)

23emsp|emspPreferential and non‐preferential models

Asmentionedabovetheresultingpreferentialmodelcanbeex‐pressedasatwo‐partmodelasfollowsAssumingthattheob‐servedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp

0+Ws

wehavetoassignadistributionfortheabundanceBasedonthefactthattheabundanceisusuallyapositiveoutcomewehave considereda gammadistributionalthough clearly other options could be possible (exponentiallognormalorPoissonforcounts)Thisyields

wheretheGaussianrandomfieldWslinkstheLGCPandtheabundanceprocessscaledbyα inthePoissonprocesspredictortoallowfordif‐ferencesinscaleThematrixQ() isestimatedimplicitlythroughanapproximation of the Gaussian random field through the stochasticpartialdifferentialequation(SPDE)approachasinLindgrenRueandLindstroumlm(2011)andSimpsonetal(2016)

Itisworthnotingthatnottakingaccountofpreferentialsamplingleads to biased results This can be easily seen by comparing thepreferentialsamplingapproachwiththefollowingsimplermodel

Notethatthemodel inEquation5assumesthat for therandomfieldswehaveZneWwhereasthepreferentialsamplingmodelassumesasinglesharedrandomfieldforboththepointpatternandthemarkForbothmodelspriordistributionshavetobechosenfortoallthepa‐rametersandhyperparametersWehaveassignedvaguepriorsthatisusedthedefaultinR‐INLAduetoagenerallackofpriorinformationAnotherapproachthatcouldbeusedhereispenalizedcomplexitypri‐ors (hereafterpcpriors) asdescribed inFuglstadSimpson LindgrenandRue(2017)andreadilyavailableinR‐INLA

Asmentionedabovebothmodelsin4and5mayincludecovari‐atesthesignificanceofwhichmaybetestedthroughmodelselectionproceduresThereisverylittleliteratureavailableonmodelselectionforpointprocessmodelshowevercriterialikethedevianceinforma‐tioncriterion(DIC)(SpiegelhalterBestCarlinmampVanDerLinde2002)aresometimesused

24emsp|emspSimulated example

In order to illustrate the effectiveness of the preferential samplingmethodandtoemphasizethemisleadingresultswewouldobtainifwedonottakeitintoconsiderationweinitiallyconsiderasimulationstudy(Figure1)OnehundredrealizationsofaGaussianspatialrandomfieldwithMateacuterncovariance functionweregeneratedovera100‐by‐100grid using theRandomFields package (SchlatherMalinowskiMenckOestingampStrokorb2015)

Foreachofthe100simulatedGaussianspatialrandomfieldsthatrepresentthedistributionofaspecies in thestudyarea twosetsof100sampleswerereproducedonedistributedpreferentiallyandonedistributed randomly In particular preferentially locations were se‐lectedusingrelativeprobabilitiesproportionaltotheintensityfunctionΛsAbundanceestimateswereextractedattheselectedlocationsasasimulationoftheobservationsforthisexperiment

Both preferential and non‐preferential models were fitted toeachof thesamplesetsandcompared in theirperformanceusing

(4)

YssimGa (s)

log (s)=0+Ws

WsimN (0Q ())

(2 log log )simMN(ww)

(5)

YssimGa(s)

log(s)=0+Zs

ZsimN(0Q())

(2 log log )simMN(zz)

F I G U R E 1 emspRepresentationofoneoftheonehundredGaussianfieldsimulatedandtherespectivepreferentiallysamplinglocationsgenerated

08

08

06

06

04

04

02

02

000000

000002

000004

000006

000008

000010

000012

000014

000016

emspensp emsp | emsp657PENNINO Et al

three different criteria the deviance information criterion (DIC)(Spiegelhalteretal2002)the log‐conditionalpredictiveordinates(LCPO)(RoosampHeld2011)andthepredictivemeanabsoluteerror(MAE)(WillmottampMatsuura2005)SpecificallytheDICmeasuresthecompromisebetween the fit and theparsimonyof themodeltheLCPOisaldquoleaveoneoutrdquocross‐validationindextoassessthepre‐dictivepowerof themodelandtheMAE indicates thepredictionerror Lower valuesofDIC LCPOandMAE suggestbettermodelperformance Finally a sensitivity analysiswithdifferent pcpriorswas carriedout to assess their influenceon the final inferenceoftherangeandvarianceofasimulatedGaussianspatialrandomfield

25emsp|emspSpatial distribution of blue and red shrimp in the Western Mediterranean Sea

In this section we illustrate how preferential sampling can beaccounted for in theconcretedataexample fromthecontextoffisherydataFishery‐dependentdataderivedfromopportunisticsamplingonboats fromthecommercial fleetpresentastandardexample of preferential sampling since clearly fishers preferen‐tiallyfishinareaswheretheyexpecttofindlargeamountsoftheirtargetspecies

Weconsiderdataonblueand redshrimp (Aristeus antennatus Risso 1816)(Carbonelletal2017DevalampKapiris2016Lleonart2005)oneofthemosteconomicallyimportantdeep‐seatrawlfish‐eryintheWesternMediterraneanSeaThedatawerecollectedbyobserversonboardanumberoffishingboatsintheGulfofAlicante(Spain)from2009to2012Thedatasetincludes77haulsfromninedifferenttrawlingvessels(Figure2)andwasprovidedbytheInstituto Espantildeol de Oceanografiacutea(IEOSpanishOceanographicInstitute)

Asmentionedthefittedeffectsfortheabundanceofblueandredshrimp(Y)arecorrectedbyjointlyfittingamodelfortheabun‐dancesandthesampling(fishing)locationsreflectingfishersrsquo(poten‐tially)impreciseknowledgeofthedistributionofblueandredshrimpWealsoassumethattheobservedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp

0+df(d)+wWs

wheref(d) represents thenotnecessarily linear relationshipwithoneco‐variate inthiscasebathymetryThereasonunderneaththisselec‐tionwasthatexploratoryanalysisrevealednon‐linearrelationshipsbetweendepthandblueandredshrimpabundanceTheremainingsecondpartofthemodelthatistheoneexplainingtheabundancealsocontainstherelationshipwiththebathymetry

where s indexes the locationofeachhauland j indexesdifferentdepths(djrepresentingthedifferentvaluesofbathymetryobservedinthestudyareafromd1=90mtpdm =40=920m)

WeuseaBayesiansmoothingsplineFahrmeirandLang(2001)tomodelnon‐lineareffectsofdepthusingasecond‐orderrandomwalk(RW2)latentmodel

Asnopriorinformationontheparametersofthemodelwasavail‐ablewe used a vague zero‐meanGaussian prior distributionwith avarianceof100forthefixedeffectsRegardingthespatialeffectpri‐orsonμκandμτwereselectedsothatthemedianpriorrangeofthiscomponentwashalfof thestudyareaand itspriormedianstandarddeviationwas1Itwasonlyinthecaseofthebathymetrythatavisualpre‐selectionofpriorswasmadetoavoidoverfittingbychangingtheprioroftheprecisionparameterwhilethemodelswerescaledtohaveageneralizedvarianceequalto1(SoslashrbyeampRue2014)Inanycaseallresultingposteriordistributionsconcentratedwellwithinthesupportofthepriorsselected

Notethateachpredictorhasitsownintercept(0prime0)butbathy‐metricf(d)andspatialeffectsWsaresharedinbothpredictorsAlsoboththebathymetricandthespatialeffectsarescaledbyαdandαwrespectivelytoallowforthedifferencesinscalebetweenblueandredshrimpabundancesandtheLGCPintensities

Itisalsoworthnotingthatρisaparameteroftheentiremodelthat reflects the global variability of the response variable As al‐readymentionedbeforeinapreferentialsamplingsituationobser‐vationsaremorefrequentwhereabundancevaluesarelikelytobehigherandthisfactcouldaffectthe inferenceabouttheρparam‐eter In linewith this apossibleextension toour globalmodelingapproachcouldbeamodelthatcouldtakeintoaccountthistypeofvariabilitybylinkingρtotheabundancemeasurements

ModelcomparisonwasperformedusingtheDICandLCPOcri‐teriaFinallyinordertotestthepredictionperformanceofthefinalpreferentialandnon‐preferentialmodelswecalculatedthePearsoncorrelation (r) index between the predicted abundance estimatesand an external database of observed abundance values in thesametimeperiod (2009ndash2012)This independentdataset includes

(6)

YssimGa(s)

log(s)=0+ f(d)+Ws

WsimN(0Q())

(2 log log )simMN(ww)

Δdj=djminusdj+1simN(0d) j=1hellip m

dsimLogGamma(400001)

F I G U R E 2 emspStudyareaandsamplinglocations(hauls)ofblueandredshrimp(Aristeus antennatus)Thesizeofthedotsrepresentstheamountcaughtineachofthelocations

50

100

200

200

200

400 400

700

700

700

700

1000

1500

1500

2000

2000

200

0

200

0

658emsp |emsp emspensp PENNINO Et al

fishery‐independentdatacollectedduringtheMEDITS(EU‐fundedMEDIterranean Trawl Survey) The MEDITS is carried out fromspringtoearlysummer(ApriltoJune)everyyearintheareaanditusesarandomsampling(Penninoetal2016)

3emsp |emspRESULTS

31emsp|emspSimulated example

Resultsobtainedinthissimulationstudyshowthatnottakingintoconsiderationpreferentialsamplingleadstomisleadingresultsspe‐ciallyatlow‐abundanceareas

Figure3showsthedifferenceinDICLCPOandMAEscoresbetweenpreferentialandnon‐preferentialmodels forbothsam‐ples the ones distributed preferentially and the one distributedrandomlyInparticularmorethan75oftheDICLCPOandMAEvaluesobtainedinthe100simulatedGaussianfieldsusingprefer‐entialmodelswere lowerthantheonesobtainedwithnon‐pref‐erentialmodels

In addition as it can be appreciated in the example shown inFigure4whichpresentstheresultsofoneoftheonehundredsim‐ulationsforexplanatorypurposesevenifnoneofthemodelswas

able tomake optimal predictions at low abundance locations thenon‐preferentialmodelperformedsignificantlyworse

Similarly Figure5which shows the posterior predictivemeanofoneofthesimulatedabundanceprocesseswithoutandwiththepreferentialsamplingcorrection((a)and(b)respectively))illustratesthatalthoughbothmodelshavesimilarpredictivespatialpatternsthepreferentiallycorrectedmodelpredictsbetteratmoderate‐to‐lowabundanceareas

FinallyFigure6showshowalldifferentpriorsofthehyperpa‐rameters of the spatial field end up fitting posterior distributionsthatareallconcentratedwithintherealvaluesResultsclearlyindi‐catethatslightchangesinthepriorsarenotanissuewhendealingwiththepreferentialsamplingmodels

32emsp|emspDistribution of blue and red shrimp in the Western Mediterranean Sea

All possiblemodels derived from 6were run Among them themost relevant results are presented in Table1While analyzingthedataweobservedthatboththebathymetricandthespatialtermsof the LGCPaccounted for approximately the same infor‐mation As a consequence full models did not converge in the

F I G U R E 3 emsp Improvementofthepreferentialmodelagainstaconventionalmodelinmodelfitscores(DICLCPOandMAE)Comparisonisbasedon100preferentiallyandrandomlysampleddatasetsNotethatpositivevaluesrepresentanimprovementonmodelfitandviceversa

minus40

minus20

0

Preferential data Random data

Type of data

Impr

ovem

ent i

n ab

unda

nce

mod

elDIC

minus03

minus02

minus01

00

01

Preferential data Random data

Type of data

LCPO

minus25

minus20

minus15

minus10

minus05

00

Preferential data Random data

Type of data

MAE

emspensp emsp | emsp659PENNINO Et al

point‐patternprocesswhichrestrictedthemodelcomparison inTable1tocorrectingonlyoneoftheeffectseitherthebathymet‐ricorthespatialeffect

Thebestmodel (basedon theDICandLCPO)was theprefer‐ential one with an shared spatial effect (ieModel 5 in Table1)ThesecondmostrelevantmodelintermisDICandLCPOwasthe

F I G U R E 4 emspSimulatedabundanceagainstpredictedabundanceinthenon‐preferentialmodel(left)andinthemodelwiththepreferentialcorrection(right)foroneoftheonehundredsimulationsperformedThenon‐preferentialmodelpredictsworsethanthepreferentialmodelatlow‐abundanceareas

0 1 2 3 4 5

02

46

810

Simulated field

Non

minuspr

efer

entia

l pre

dict

ion

0 1 2 3 4 50

24

68

10

Simulated field

Pre

fere

ntia

l pre

dict

ion

F I G U R E 5 emspPosteriorpredictivemeanmapsofoneoftheonehundredsimulatedabundanceprocesseswithout(left)andwith(right)thepreferentialsamplingcorrection

20

25

30

35

40

45

50

55

20

25

30

35

40

45

50

55

60

(a) (b)

660emsp |emsp emspensp PENNINO Et al

preferential one that included in addition to the spatial effect asharedbathymetriceffect(ieModel8inTable1)

Figure7showsthemeanoftheposteriordistributionofthespa‐tialeffectinthemodelwithoutandwithpreferentialsamplingwhileFigure8illustratestheposteriorpredictivemeanoftheblueandredshrimp distribution without and with the preferential correctionBothfiguresshowasimilarpatternIndeeditisclearthatthespatialoutputsobtainedwiththepreferentialmodelbetterabsorbthevari‐abilityofthespecieshabitatprovidingamorenaturalpatternoftheblueandredshrimpdistribution

Finally formodel validation the final preferentialmodel ob‐tained a reasonably high value for Pearsons r (r=047) in thecross‐validationwiththeMEDITSdatasetwithrespecttothenon‐preferential one (r=022) It isworthmentioning thatMEDITSsareperformedonlyinspringsummerandfewsamplingsarecar‐riedoutinthestudyareaThesefindingsfurtherhighlighthowthe

correctionofthepreferentialmodelisimportanttoreflecttherealdistributionofaspecies

4emsp |emspDISCUSSION

Inthispaperwepresentedamodelingapproachthatcouldbeveryusefulformodelingthedistributionofspeciesusingopportunisticdataandacquiringin‐depthknowledgethatcouldbeessentialforthecorrectmanagementofnaturalresourcesSpatialecologyhasadirectappliedrelevancetonaturalresourcemanagementbutitalsohasabroadecologicalsignificanceAlthoughitmaybecompli‐catedtodefinetheboundariesofspecieshabitatscombinedwithanefficientmanagementthatrecognizestheimportanceofsuchareasthisrepresentsthefirststeptowardfacilitatinganeffectivespatialmanagement However as shown by our results using anon‐accurate approach could culminate inmisidentification of aspecieshabitatanduncertainpredictionsandsoininappropriatemanagementmeasureswhichcansometimesbeirreversible

Theresultsofthepracticalapplicationonblueandredshrimpincludedhereasarealworldscenarioshowthatpredictivemapssignificantly improve the prediction of the target specieswhenthemodelaccountsforpreferentialsamplingIndeeditisknownthatthesuitablebathymetricrangeoftheblueandredshrimpintheMediterraneanseaisbetween200and200meters(GuijarroMassutiacuteMorantaampDiacuteaz2008)thatisreflectedbythepreferen‐tialmodelpredictionmapIncontrastthenon‐preferentialmodelpredictionwasnotabletocapturetherealdistributionofthespe‐ciesInadditionifamanagementmeasuresuchascreationofamarineconservationareashouldbeappliedonthebasisof thenon‐preferentialmodel it isclearthat itwouldnotbeappropri‐ateThiscouldresultinextremelylargeareabeingrecommended

TA B L E 1 emspModelcomparisonfortheabundanceoftheblueandredshrimp(Aristeus antennatus)basedonDICLCPOandcomputationaltimes

Model DIC LCPO Times (s)

1 Intc+Depth +19 +005 3

2 Intc+Spatial +9 minus 24

3 Intc+Depth+Spatial +11 +001 57

4 Intc+Depth +52 +024 21

5 Intc+Spatial minus minus 171

6 Intc+Spatial+Depth +5 +002 212

7 Intc+Depth+Spatial +10 +001 2275

8 Intc+Depth+Spatial +3 +003 3470

NotesDICandLCPOscoresarepresentedasdeviationsfromthebestmodelIntcInterceptBoldtermssharedcomponents

F I G U R E 6 emspSensitivityanalysisofthepcpriordistributionsfortherangeandvarianceofasimulatedspatialfieldDashedlinesrepresentpriordistributionssolidlinesposteriordistributionsandverticallinestherealvaluesofeachofthehyperparametersofthespatialfieldrangeintheleftpanelandvarianceintherightpanelRangepriorsweresetsothattheprobabilityofhavingarangesmallerthan10203040and50ofthemaximumdistanceofthestudyareawas025Similarlypriorsoverthevarianceofthespatialfieldweresetsothattheprobabilityofhavingavariancehigherthan2345and6was01

05 10 15

00

05

10

15

20

25

Matern range

x

y

PosteriorPriorReal

PosteriorPriorReal

15 10 5

000

005

010

015

020

025

030

Matern variance

x

y

(a) (b)

emspensp emsp | emsp661PENNINO Et al

for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies

Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)

Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults

Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies

AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)

This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing

F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations

minus04

minus03

minus02

minus01

00

01

02

03

04

05

minus3

minus2

minus1

0

1

2

3

4

5

6

(a) (b)

F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations

25

30

35

40

45

minus4

minus3

minus2

minus1

0

1

2

3

4

5

(a) (b)

662emsp |emsp emspensp PENNINO Et al

parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence

Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges

ACKNOWLEDG MENTS

D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare

CONFLIC T OF INTERE S T

NoneDeclared

AUTHOR CONTRIBUTIONS

AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata

DATA ACCE SSIBILIT Y

Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit

R E FE R E N C E S

BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC

Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x

CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453

ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803

DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A

Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x

Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress

DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x

Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229

FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6

FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8

GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799

GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003

Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184

Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017

IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons

KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic

emspensp emsp | emsp663PENNINO Et al

observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x

LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609

Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x

LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64

Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281

PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062

ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods

RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x

RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609

Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x

RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223

SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25

Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064

SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004

SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353

Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2

VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska

WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079

How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789

656emsp |emsp emspensp PENNINO Et al

spatialeffectW )involvedinthemodelsNeverthelessasusualinthiskindofmodels theresultingposteriordistributionsarenotanalyticallyknownandsonumericalapproachesareneeded

22emsp|emspFitting models with INLA

Model‐fitting methods based on Markov chain Monte Carlo(MCMC)canbevery time‐consuming for spatialmodels inpar‐ticularLGCPsNeverthelessLGCPsareaspecialcaseofthemoregeneralclassof latentGaussianmodelswhichcanbedescribedas a subclass of structured additive regression (STAR) models(Fahrmeir amp Tutz 2001) In these models the mean of the re‐sponsevariableis linkedtoastructuredpredictorwhichcanbeexpressedintermsoflinearandnon‐lineareffectsofcovariatesInaBayesianframeworkbyassigningGaussianpriorstoallran‐domtermsinthepredictorweobtainalatentGaussianmodelAsaresultwecandirectlycomputeLGCPmodelsusingIntegratednested Laplace approximation (INLA) INLAprovides a fast yetaccurate approach to fitting latentGaussianmodels andmakesthe inclusion of covariates andmarked point processesmathe‐maticallytractablewithcomputationallyefficientinference(Illianetal2013SimpsonIllianLindgrenSoslashrbyeampRue2016)

23emsp|emspPreferential and non‐preferential models

Asmentionedabovetheresultingpreferentialmodelcanbeex‐pressedasatwo‐partmodelasfollowsAssumingthattheob‐servedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp

0+Ws

wehavetoassignadistributionfortheabundanceBasedonthefactthattheabundanceisusuallyapositiveoutcomewehave considereda gammadistributionalthough clearly other options could be possible (exponentiallognormalorPoissonforcounts)Thisyields

wheretheGaussianrandomfieldWslinkstheLGCPandtheabundanceprocessscaledbyα inthePoissonprocesspredictortoallowfordif‐ferencesinscaleThematrixQ() isestimatedimplicitlythroughanapproximation of the Gaussian random field through the stochasticpartialdifferentialequation(SPDE)approachasinLindgrenRueandLindstroumlm(2011)andSimpsonetal(2016)

Itisworthnotingthatnottakingaccountofpreferentialsamplingleads to biased results This can be easily seen by comparing thepreferentialsamplingapproachwiththefollowingsimplermodel

Notethatthemodel inEquation5assumesthat for therandomfieldswehaveZneWwhereasthepreferentialsamplingmodelassumesasinglesharedrandomfieldforboththepointpatternandthemarkForbothmodelspriordistributionshavetobechosenfortoallthepa‐rametersandhyperparametersWehaveassignedvaguepriorsthatisusedthedefaultinR‐INLAduetoagenerallackofpriorinformationAnotherapproachthatcouldbeusedhereispenalizedcomplexitypri‐ors (hereafterpcpriors) asdescribed inFuglstadSimpson LindgrenandRue(2017)andreadilyavailableinR‐INLA

Asmentionedabovebothmodelsin4and5mayincludecovari‐atesthesignificanceofwhichmaybetestedthroughmodelselectionproceduresThereisverylittleliteratureavailableonmodelselectionforpointprocessmodelshowevercriterialikethedevianceinforma‐tioncriterion(DIC)(SpiegelhalterBestCarlinmampVanDerLinde2002)aresometimesused

24emsp|emspSimulated example

In order to illustrate the effectiveness of the preferential samplingmethodandtoemphasizethemisleadingresultswewouldobtainifwedonottakeitintoconsiderationweinitiallyconsiderasimulationstudy(Figure1)OnehundredrealizationsofaGaussianspatialrandomfieldwithMateacuterncovariance functionweregeneratedovera100‐by‐100grid using theRandomFields package (SchlatherMalinowskiMenckOestingampStrokorb2015)

Foreachofthe100simulatedGaussianspatialrandomfieldsthatrepresentthedistributionofaspecies in thestudyarea twosetsof100sampleswerereproducedonedistributedpreferentiallyandonedistributed randomly In particular preferentially locations were se‐lectedusingrelativeprobabilitiesproportionaltotheintensityfunctionΛsAbundanceestimateswereextractedattheselectedlocationsasasimulationoftheobservationsforthisexperiment

Both preferential and non‐preferential models were fitted toeachof thesamplesetsandcompared in theirperformanceusing

(4)

YssimGa (s)

log (s)=0+Ws

WsimN (0Q ())

(2 log log )simMN(ww)

(5)

YssimGa(s)

log(s)=0+Zs

ZsimN(0Q())

(2 log log )simMN(zz)

F I G U R E 1 emspRepresentationofoneoftheonehundredGaussianfieldsimulatedandtherespectivepreferentiallysamplinglocationsgenerated

08

08

06

06

04

04

02

02

000000

000002

000004

000006

000008

000010

000012

000014

000016

emspensp emsp | emsp657PENNINO Et al

three different criteria the deviance information criterion (DIC)(Spiegelhalteretal2002)the log‐conditionalpredictiveordinates(LCPO)(RoosampHeld2011)andthepredictivemeanabsoluteerror(MAE)(WillmottampMatsuura2005)SpecificallytheDICmeasuresthecompromisebetween the fit and theparsimonyof themodeltheLCPOisaldquoleaveoneoutrdquocross‐validationindextoassessthepre‐dictivepowerof themodelandtheMAE indicates thepredictionerror Lower valuesofDIC LCPOandMAE suggestbettermodelperformance Finally a sensitivity analysiswithdifferent pcpriorswas carriedout to assess their influenceon the final inferenceoftherangeandvarianceofasimulatedGaussianspatialrandomfield

25emsp|emspSpatial distribution of blue and red shrimp in the Western Mediterranean Sea

In this section we illustrate how preferential sampling can beaccounted for in theconcretedataexample fromthecontextoffisherydataFishery‐dependentdataderivedfromopportunisticsamplingonboats fromthecommercial fleetpresentastandardexample of preferential sampling since clearly fishers preferen‐tiallyfishinareaswheretheyexpecttofindlargeamountsoftheirtargetspecies

Weconsiderdataonblueand redshrimp (Aristeus antennatus Risso 1816)(Carbonelletal2017DevalampKapiris2016Lleonart2005)oneofthemosteconomicallyimportantdeep‐seatrawlfish‐eryintheWesternMediterraneanSeaThedatawerecollectedbyobserversonboardanumberoffishingboatsintheGulfofAlicante(Spain)from2009to2012Thedatasetincludes77haulsfromninedifferenttrawlingvessels(Figure2)andwasprovidedbytheInstituto Espantildeol de Oceanografiacutea(IEOSpanishOceanographicInstitute)

Asmentionedthefittedeffectsfortheabundanceofblueandredshrimp(Y)arecorrectedbyjointlyfittingamodelfortheabun‐dancesandthesampling(fishing)locationsreflectingfishersrsquo(poten‐tially)impreciseknowledgeofthedistributionofblueandredshrimpWealsoassumethattheobservedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp

0+df(d)+wWs

wheref(d) represents thenotnecessarily linear relationshipwithoneco‐variate inthiscasebathymetryThereasonunderneaththisselec‐tionwasthatexploratoryanalysisrevealednon‐linearrelationshipsbetweendepthandblueandredshrimpabundanceTheremainingsecondpartofthemodelthatistheoneexplainingtheabundancealsocontainstherelationshipwiththebathymetry

where s indexes the locationofeachhauland j indexesdifferentdepths(djrepresentingthedifferentvaluesofbathymetryobservedinthestudyareafromd1=90mtpdm =40=920m)

WeuseaBayesiansmoothingsplineFahrmeirandLang(2001)tomodelnon‐lineareffectsofdepthusingasecond‐orderrandomwalk(RW2)latentmodel

Asnopriorinformationontheparametersofthemodelwasavail‐ablewe used a vague zero‐meanGaussian prior distributionwith avarianceof100forthefixedeffectsRegardingthespatialeffectpri‐orsonμκandμτwereselectedsothatthemedianpriorrangeofthiscomponentwashalfof thestudyareaand itspriormedianstandarddeviationwas1Itwasonlyinthecaseofthebathymetrythatavisualpre‐selectionofpriorswasmadetoavoidoverfittingbychangingtheprioroftheprecisionparameterwhilethemodelswerescaledtohaveageneralizedvarianceequalto1(SoslashrbyeampRue2014)Inanycaseallresultingposteriordistributionsconcentratedwellwithinthesupportofthepriorsselected

Notethateachpredictorhasitsownintercept(0prime0)butbathy‐metricf(d)andspatialeffectsWsaresharedinbothpredictorsAlsoboththebathymetricandthespatialeffectsarescaledbyαdandαwrespectivelytoallowforthedifferencesinscalebetweenblueandredshrimpabundancesandtheLGCPintensities

Itisalsoworthnotingthatρisaparameteroftheentiremodelthat reflects the global variability of the response variable As al‐readymentionedbeforeinapreferentialsamplingsituationobser‐vationsaremorefrequentwhereabundancevaluesarelikelytobehigherandthisfactcouldaffectthe inferenceabouttheρparam‐eter In linewith this apossibleextension toour globalmodelingapproachcouldbeamodelthatcouldtakeintoaccountthistypeofvariabilitybylinkingρtotheabundancemeasurements

ModelcomparisonwasperformedusingtheDICandLCPOcri‐teriaFinallyinordertotestthepredictionperformanceofthefinalpreferentialandnon‐preferentialmodelswecalculatedthePearsoncorrelation (r) index between the predicted abundance estimatesand an external database of observed abundance values in thesametimeperiod (2009ndash2012)This independentdataset includes

(6)

YssimGa(s)

log(s)=0+ f(d)+Ws

WsimN(0Q())

(2 log log )simMN(ww)

Δdj=djminusdj+1simN(0d) j=1hellip m

dsimLogGamma(400001)

F I G U R E 2 emspStudyareaandsamplinglocations(hauls)ofblueandredshrimp(Aristeus antennatus)Thesizeofthedotsrepresentstheamountcaughtineachofthelocations

50

100

200

200

200

400 400

700

700

700

700

1000

1500

1500

2000

2000

200

0

200

0

658emsp |emsp emspensp PENNINO Et al

fishery‐independentdatacollectedduringtheMEDITS(EU‐fundedMEDIterranean Trawl Survey) The MEDITS is carried out fromspringtoearlysummer(ApriltoJune)everyyearintheareaanditusesarandomsampling(Penninoetal2016)

3emsp |emspRESULTS

31emsp|emspSimulated example

Resultsobtainedinthissimulationstudyshowthatnottakingintoconsiderationpreferentialsamplingleadstomisleadingresultsspe‐ciallyatlow‐abundanceareas

Figure3showsthedifferenceinDICLCPOandMAEscoresbetweenpreferentialandnon‐preferentialmodels forbothsam‐ples the ones distributed preferentially and the one distributedrandomlyInparticularmorethan75oftheDICLCPOandMAEvaluesobtainedinthe100simulatedGaussianfieldsusingprefer‐entialmodelswere lowerthantheonesobtainedwithnon‐pref‐erentialmodels

In addition as it can be appreciated in the example shown inFigure4whichpresentstheresultsofoneoftheonehundredsim‐ulationsforexplanatorypurposesevenifnoneofthemodelswas

able tomake optimal predictions at low abundance locations thenon‐preferentialmodelperformedsignificantlyworse

Similarly Figure5which shows the posterior predictivemeanofoneofthesimulatedabundanceprocesseswithoutandwiththepreferentialsamplingcorrection((a)and(b)respectively))illustratesthatalthoughbothmodelshavesimilarpredictivespatialpatternsthepreferentiallycorrectedmodelpredictsbetteratmoderate‐to‐lowabundanceareas

FinallyFigure6showshowalldifferentpriorsofthehyperpa‐rameters of the spatial field end up fitting posterior distributionsthatareallconcentratedwithintherealvaluesResultsclearlyindi‐catethatslightchangesinthepriorsarenotanissuewhendealingwiththepreferentialsamplingmodels

32emsp|emspDistribution of blue and red shrimp in the Western Mediterranean Sea

All possiblemodels derived from 6were run Among them themost relevant results are presented in Table1While analyzingthedataweobservedthatboththebathymetricandthespatialtermsof the LGCPaccounted for approximately the same infor‐mation As a consequence full models did not converge in the

F I G U R E 3 emsp Improvementofthepreferentialmodelagainstaconventionalmodelinmodelfitscores(DICLCPOandMAE)Comparisonisbasedon100preferentiallyandrandomlysampleddatasetsNotethatpositivevaluesrepresentanimprovementonmodelfitandviceversa

minus40

minus20

0

Preferential data Random data

Type of data

Impr

ovem

ent i

n ab

unda

nce

mod

elDIC

minus03

minus02

minus01

00

01

Preferential data Random data

Type of data

LCPO

minus25

minus20

minus15

minus10

minus05

00

Preferential data Random data

Type of data

MAE

emspensp emsp | emsp659PENNINO Et al

point‐patternprocesswhichrestrictedthemodelcomparison inTable1tocorrectingonlyoneoftheeffectseitherthebathymet‐ricorthespatialeffect

Thebestmodel (basedon theDICandLCPO)was theprefer‐ential one with an shared spatial effect (ieModel 5 in Table1)ThesecondmostrelevantmodelintermisDICandLCPOwasthe

F I G U R E 4 emspSimulatedabundanceagainstpredictedabundanceinthenon‐preferentialmodel(left)andinthemodelwiththepreferentialcorrection(right)foroneoftheonehundredsimulationsperformedThenon‐preferentialmodelpredictsworsethanthepreferentialmodelatlow‐abundanceareas

0 1 2 3 4 5

02

46

810

Simulated field

Non

minuspr

efer

entia

l pre

dict

ion

0 1 2 3 4 50

24

68

10

Simulated field

Pre

fere

ntia

l pre

dict

ion

F I G U R E 5 emspPosteriorpredictivemeanmapsofoneoftheonehundredsimulatedabundanceprocesseswithout(left)andwith(right)thepreferentialsamplingcorrection

20

25

30

35

40

45

50

55

20

25

30

35

40

45

50

55

60

(a) (b)

660emsp |emsp emspensp PENNINO Et al

preferential one that included in addition to the spatial effect asharedbathymetriceffect(ieModel8inTable1)

Figure7showsthemeanoftheposteriordistributionofthespa‐tialeffectinthemodelwithoutandwithpreferentialsamplingwhileFigure8illustratestheposteriorpredictivemeanoftheblueandredshrimp distribution without and with the preferential correctionBothfiguresshowasimilarpatternIndeeditisclearthatthespatialoutputsobtainedwiththepreferentialmodelbetterabsorbthevari‐abilityofthespecieshabitatprovidingamorenaturalpatternoftheblueandredshrimpdistribution

Finally formodel validation the final preferentialmodel ob‐tained a reasonably high value for Pearsons r (r=047) in thecross‐validationwiththeMEDITSdatasetwithrespecttothenon‐preferential one (r=022) It isworthmentioning thatMEDITSsareperformedonlyinspringsummerandfewsamplingsarecar‐riedoutinthestudyareaThesefindingsfurtherhighlighthowthe

correctionofthepreferentialmodelisimportanttoreflecttherealdistributionofaspecies

4emsp |emspDISCUSSION

Inthispaperwepresentedamodelingapproachthatcouldbeveryusefulformodelingthedistributionofspeciesusingopportunisticdataandacquiringin‐depthknowledgethatcouldbeessentialforthecorrectmanagementofnaturalresourcesSpatialecologyhasadirectappliedrelevancetonaturalresourcemanagementbutitalsohasabroadecologicalsignificanceAlthoughitmaybecompli‐catedtodefinetheboundariesofspecieshabitatscombinedwithanefficientmanagementthatrecognizestheimportanceofsuchareasthisrepresentsthefirststeptowardfacilitatinganeffectivespatialmanagement However as shown by our results using anon‐accurate approach could culminate inmisidentification of aspecieshabitatanduncertainpredictionsandsoininappropriatemanagementmeasureswhichcansometimesbeirreversible

Theresultsofthepracticalapplicationonblueandredshrimpincludedhereasarealworldscenarioshowthatpredictivemapssignificantly improve the prediction of the target specieswhenthemodelaccountsforpreferentialsamplingIndeeditisknownthatthesuitablebathymetricrangeoftheblueandredshrimpintheMediterraneanseaisbetween200and200meters(GuijarroMassutiacuteMorantaampDiacuteaz2008)thatisreflectedbythepreferen‐tialmodelpredictionmapIncontrastthenon‐preferentialmodelpredictionwasnotabletocapturetherealdistributionofthespe‐ciesInadditionifamanagementmeasuresuchascreationofamarineconservationareashouldbeappliedonthebasisof thenon‐preferentialmodel it isclearthat itwouldnotbeappropri‐ateThiscouldresultinextremelylargeareabeingrecommended

TA B L E 1 emspModelcomparisonfortheabundanceoftheblueandredshrimp(Aristeus antennatus)basedonDICLCPOandcomputationaltimes

Model DIC LCPO Times (s)

1 Intc+Depth +19 +005 3

2 Intc+Spatial +9 minus 24

3 Intc+Depth+Spatial +11 +001 57

4 Intc+Depth +52 +024 21

5 Intc+Spatial minus minus 171

6 Intc+Spatial+Depth +5 +002 212

7 Intc+Depth+Spatial +10 +001 2275

8 Intc+Depth+Spatial +3 +003 3470

NotesDICandLCPOscoresarepresentedasdeviationsfromthebestmodelIntcInterceptBoldtermssharedcomponents

F I G U R E 6 emspSensitivityanalysisofthepcpriordistributionsfortherangeandvarianceofasimulatedspatialfieldDashedlinesrepresentpriordistributionssolidlinesposteriordistributionsandverticallinestherealvaluesofeachofthehyperparametersofthespatialfieldrangeintheleftpanelandvarianceintherightpanelRangepriorsweresetsothattheprobabilityofhavingarangesmallerthan10203040and50ofthemaximumdistanceofthestudyareawas025Similarlypriorsoverthevarianceofthespatialfieldweresetsothattheprobabilityofhavingavariancehigherthan2345and6was01

05 10 15

00

05

10

15

20

25

Matern range

x

y

PosteriorPriorReal

PosteriorPriorReal

15 10 5

000

005

010

015

020

025

030

Matern variance

x

y

(a) (b)

emspensp emsp | emsp661PENNINO Et al

for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies

Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)

Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults

Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies

AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)

This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing

F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations

minus04

minus03

minus02

minus01

00

01

02

03

04

05

minus3

minus2

minus1

0

1

2

3

4

5

6

(a) (b)

F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations

25

30

35

40

45

minus4

minus3

minus2

minus1

0

1

2

3

4

5

(a) (b)

662emsp |emsp emspensp PENNINO Et al

parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence

Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges

ACKNOWLEDG MENTS

D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare

CONFLIC T OF INTERE S T

NoneDeclared

AUTHOR CONTRIBUTIONS

AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata

DATA ACCE SSIBILIT Y

Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit

R E FE R E N C E S

BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC

Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x

CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453

ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803

DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A

Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x

Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress

DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x

Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229

FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6

FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8

GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799

GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003

Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184

Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017

IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons

KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic

emspensp emsp | emsp663PENNINO Et al

observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x

LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609

Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x

LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64

Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281

PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062

ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods

RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x

RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609

Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x

RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223

SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25

Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064

SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004

SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353

Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2

VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska

WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079

How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789

emspensp emsp | emsp657PENNINO Et al

three different criteria the deviance information criterion (DIC)(Spiegelhalteretal2002)the log‐conditionalpredictiveordinates(LCPO)(RoosampHeld2011)andthepredictivemeanabsoluteerror(MAE)(WillmottampMatsuura2005)SpecificallytheDICmeasuresthecompromisebetween the fit and theparsimonyof themodeltheLCPOisaldquoleaveoneoutrdquocross‐validationindextoassessthepre‐dictivepowerof themodelandtheMAE indicates thepredictionerror Lower valuesofDIC LCPOandMAE suggestbettermodelperformance Finally a sensitivity analysiswithdifferent pcpriorswas carriedout to assess their influenceon the final inferenceoftherangeandvarianceofasimulatedGaussianspatialrandomfield

25emsp|emspSpatial distribution of blue and red shrimp in the Western Mediterranean Sea

In this section we illustrate how preferential sampling can beaccounted for in theconcretedataexample fromthecontextoffisherydataFishery‐dependentdataderivedfromopportunisticsamplingonboats fromthecommercial fleetpresentastandardexample of preferential sampling since clearly fishers preferen‐tiallyfishinareaswheretheyexpecttofindlargeamountsoftheirtargetspecies

Weconsiderdataonblueand redshrimp (Aristeus antennatus Risso 1816)(Carbonelletal2017DevalampKapiris2016Lleonart2005)oneofthemosteconomicallyimportantdeep‐seatrawlfish‐eryintheWesternMediterraneanSeaThedatawerecollectedbyobserversonboardanumberoffishingboatsintheGulfofAlicante(Spain)from2009to2012Thedatasetincludes77haulsfromninedifferenttrawlingvessels(Figure2)andwasprovidedbytheInstituto Espantildeol de Oceanografiacutea(IEOSpanishOceanographicInstitute)

Asmentionedthefittedeffectsfortheabundanceofblueandredshrimp(Y)arecorrectedbyjointlyfittingamodelfortheabun‐dancesandthesampling(fishing)locationsreflectingfishersrsquo(poten‐tially)impreciseknowledgeofthedistributionofblueandredshrimpWealsoassumethattheobservedlocationss = (s1hellipsn)comefromaPoissonprocesswithintensityΛs=exp

0+df(d)+wWs

wheref(d) represents thenotnecessarily linear relationshipwithoneco‐variate inthiscasebathymetryThereasonunderneaththisselec‐tionwasthatexploratoryanalysisrevealednon‐linearrelationshipsbetweendepthandblueandredshrimpabundanceTheremainingsecondpartofthemodelthatistheoneexplainingtheabundancealsocontainstherelationshipwiththebathymetry

where s indexes the locationofeachhauland j indexesdifferentdepths(djrepresentingthedifferentvaluesofbathymetryobservedinthestudyareafromd1=90mtpdm =40=920m)

WeuseaBayesiansmoothingsplineFahrmeirandLang(2001)tomodelnon‐lineareffectsofdepthusingasecond‐orderrandomwalk(RW2)latentmodel

Asnopriorinformationontheparametersofthemodelwasavail‐ablewe used a vague zero‐meanGaussian prior distributionwith avarianceof100forthefixedeffectsRegardingthespatialeffectpri‐orsonμκandμτwereselectedsothatthemedianpriorrangeofthiscomponentwashalfof thestudyareaand itspriormedianstandarddeviationwas1Itwasonlyinthecaseofthebathymetrythatavisualpre‐selectionofpriorswasmadetoavoidoverfittingbychangingtheprioroftheprecisionparameterwhilethemodelswerescaledtohaveageneralizedvarianceequalto1(SoslashrbyeampRue2014)Inanycaseallresultingposteriordistributionsconcentratedwellwithinthesupportofthepriorsselected

Notethateachpredictorhasitsownintercept(0prime0)butbathy‐metricf(d)andspatialeffectsWsaresharedinbothpredictorsAlsoboththebathymetricandthespatialeffectsarescaledbyαdandαwrespectivelytoallowforthedifferencesinscalebetweenblueandredshrimpabundancesandtheLGCPintensities

Itisalsoworthnotingthatρisaparameteroftheentiremodelthat reflects the global variability of the response variable As al‐readymentionedbeforeinapreferentialsamplingsituationobser‐vationsaremorefrequentwhereabundancevaluesarelikelytobehigherandthisfactcouldaffectthe inferenceabouttheρparam‐eter In linewith this apossibleextension toour globalmodelingapproachcouldbeamodelthatcouldtakeintoaccountthistypeofvariabilitybylinkingρtotheabundancemeasurements

ModelcomparisonwasperformedusingtheDICandLCPOcri‐teriaFinallyinordertotestthepredictionperformanceofthefinalpreferentialandnon‐preferentialmodelswecalculatedthePearsoncorrelation (r) index between the predicted abundance estimatesand an external database of observed abundance values in thesametimeperiod (2009ndash2012)This independentdataset includes

(6)

YssimGa(s)

log(s)=0+ f(d)+Ws

WsimN(0Q())

(2 log log )simMN(ww)

Δdj=djminusdj+1simN(0d) j=1hellip m

dsimLogGamma(400001)

F I G U R E 2 emspStudyareaandsamplinglocations(hauls)ofblueandredshrimp(Aristeus antennatus)Thesizeofthedotsrepresentstheamountcaughtineachofthelocations

50

100

200

200

200

400 400

700

700

700

700

1000

1500

1500

2000

2000

200

0

200

0

658emsp |emsp emspensp PENNINO Et al

fishery‐independentdatacollectedduringtheMEDITS(EU‐fundedMEDIterranean Trawl Survey) The MEDITS is carried out fromspringtoearlysummer(ApriltoJune)everyyearintheareaanditusesarandomsampling(Penninoetal2016)

3emsp |emspRESULTS

31emsp|emspSimulated example

Resultsobtainedinthissimulationstudyshowthatnottakingintoconsiderationpreferentialsamplingleadstomisleadingresultsspe‐ciallyatlow‐abundanceareas

Figure3showsthedifferenceinDICLCPOandMAEscoresbetweenpreferentialandnon‐preferentialmodels forbothsam‐ples the ones distributed preferentially and the one distributedrandomlyInparticularmorethan75oftheDICLCPOandMAEvaluesobtainedinthe100simulatedGaussianfieldsusingprefer‐entialmodelswere lowerthantheonesobtainedwithnon‐pref‐erentialmodels

In addition as it can be appreciated in the example shown inFigure4whichpresentstheresultsofoneoftheonehundredsim‐ulationsforexplanatorypurposesevenifnoneofthemodelswas

able tomake optimal predictions at low abundance locations thenon‐preferentialmodelperformedsignificantlyworse

Similarly Figure5which shows the posterior predictivemeanofoneofthesimulatedabundanceprocesseswithoutandwiththepreferentialsamplingcorrection((a)and(b)respectively))illustratesthatalthoughbothmodelshavesimilarpredictivespatialpatternsthepreferentiallycorrectedmodelpredictsbetteratmoderate‐to‐lowabundanceareas

FinallyFigure6showshowalldifferentpriorsofthehyperpa‐rameters of the spatial field end up fitting posterior distributionsthatareallconcentratedwithintherealvaluesResultsclearlyindi‐catethatslightchangesinthepriorsarenotanissuewhendealingwiththepreferentialsamplingmodels

32emsp|emspDistribution of blue and red shrimp in the Western Mediterranean Sea

All possiblemodels derived from 6were run Among them themost relevant results are presented in Table1While analyzingthedataweobservedthatboththebathymetricandthespatialtermsof the LGCPaccounted for approximately the same infor‐mation As a consequence full models did not converge in the

F I G U R E 3 emsp Improvementofthepreferentialmodelagainstaconventionalmodelinmodelfitscores(DICLCPOandMAE)Comparisonisbasedon100preferentiallyandrandomlysampleddatasetsNotethatpositivevaluesrepresentanimprovementonmodelfitandviceversa

minus40

minus20

0

Preferential data Random data

Type of data

Impr

ovem

ent i

n ab

unda

nce

mod

elDIC

minus03

minus02

minus01

00

01

Preferential data Random data

Type of data

LCPO

minus25

minus20

minus15

minus10

minus05

00

Preferential data Random data

Type of data

MAE

emspensp emsp | emsp659PENNINO Et al

point‐patternprocesswhichrestrictedthemodelcomparison inTable1tocorrectingonlyoneoftheeffectseitherthebathymet‐ricorthespatialeffect

Thebestmodel (basedon theDICandLCPO)was theprefer‐ential one with an shared spatial effect (ieModel 5 in Table1)ThesecondmostrelevantmodelintermisDICandLCPOwasthe

F I G U R E 4 emspSimulatedabundanceagainstpredictedabundanceinthenon‐preferentialmodel(left)andinthemodelwiththepreferentialcorrection(right)foroneoftheonehundredsimulationsperformedThenon‐preferentialmodelpredictsworsethanthepreferentialmodelatlow‐abundanceareas

0 1 2 3 4 5

02

46

810

Simulated field

Non

minuspr

efer

entia

l pre

dict

ion

0 1 2 3 4 50

24

68

10

Simulated field

Pre

fere

ntia

l pre

dict

ion

F I G U R E 5 emspPosteriorpredictivemeanmapsofoneoftheonehundredsimulatedabundanceprocesseswithout(left)andwith(right)thepreferentialsamplingcorrection

20

25

30

35

40

45

50

55

20

25

30

35

40

45

50

55

60

(a) (b)

660emsp |emsp emspensp PENNINO Et al

preferential one that included in addition to the spatial effect asharedbathymetriceffect(ieModel8inTable1)

Figure7showsthemeanoftheposteriordistributionofthespa‐tialeffectinthemodelwithoutandwithpreferentialsamplingwhileFigure8illustratestheposteriorpredictivemeanoftheblueandredshrimp distribution without and with the preferential correctionBothfiguresshowasimilarpatternIndeeditisclearthatthespatialoutputsobtainedwiththepreferentialmodelbetterabsorbthevari‐abilityofthespecieshabitatprovidingamorenaturalpatternoftheblueandredshrimpdistribution

Finally formodel validation the final preferentialmodel ob‐tained a reasonably high value for Pearsons r (r=047) in thecross‐validationwiththeMEDITSdatasetwithrespecttothenon‐preferential one (r=022) It isworthmentioning thatMEDITSsareperformedonlyinspringsummerandfewsamplingsarecar‐riedoutinthestudyareaThesefindingsfurtherhighlighthowthe

correctionofthepreferentialmodelisimportanttoreflecttherealdistributionofaspecies

4emsp |emspDISCUSSION

Inthispaperwepresentedamodelingapproachthatcouldbeveryusefulformodelingthedistributionofspeciesusingopportunisticdataandacquiringin‐depthknowledgethatcouldbeessentialforthecorrectmanagementofnaturalresourcesSpatialecologyhasadirectappliedrelevancetonaturalresourcemanagementbutitalsohasabroadecologicalsignificanceAlthoughitmaybecompli‐catedtodefinetheboundariesofspecieshabitatscombinedwithanefficientmanagementthatrecognizestheimportanceofsuchareasthisrepresentsthefirststeptowardfacilitatinganeffectivespatialmanagement However as shown by our results using anon‐accurate approach could culminate inmisidentification of aspecieshabitatanduncertainpredictionsandsoininappropriatemanagementmeasureswhichcansometimesbeirreversible

Theresultsofthepracticalapplicationonblueandredshrimpincludedhereasarealworldscenarioshowthatpredictivemapssignificantly improve the prediction of the target specieswhenthemodelaccountsforpreferentialsamplingIndeeditisknownthatthesuitablebathymetricrangeoftheblueandredshrimpintheMediterraneanseaisbetween200and200meters(GuijarroMassutiacuteMorantaampDiacuteaz2008)thatisreflectedbythepreferen‐tialmodelpredictionmapIncontrastthenon‐preferentialmodelpredictionwasnotabletocapturetherealdistributionofthespe‐ciesInadditionifamanagementmeasuresuchascreationofamarineconservationareashouldbeappliedonthebasisof thenon‐preferentialmodel it isclearthat itwouldnotbeappropri‐ateThiscouldresultinextremelylargeareabeingrecommended

TA B L E 1 emspModelcomparisonfortheabundanceoftheblueandredshrimp(Aristeus antennatus)basedonDICLCPOandcomputationaltimes

Model DIC LCPO Times (s)

1 Intc+Depth +19 +005 3

2 Intc+Spatial +9 minus 24

3 Intc+Depth+Spatial +11 +001 57

4 Intc+Depth +52 +024 21

5 Intc+Spatial minus minus 171

6 Intc+Spatial+Depth +5 +002 212

7 Intc+Depth+Spatial +10 +001 2275

8 Intc+Depth+Spatial +3 +003 3470

NotesDICandLCPOscoresarepresentedasdeviationsfromthebestmodelIntcInterceptBoldtermssharedcomponents

F I G U R E 6 emspSensitivityanalysisofthepcpriordistributionsfortherangeandvarianceofasimulatedspatialfieldDashedlinesrepresentpriordistributionssolidlinesposteriordistributionsandverticallinestherealvaluesofeachofthehyperparametersofthespatialfieldrangeintheleftpanelandvarianceintherightpanelRangepriorsweresetsothattheprobabilityofhavingarangesmallerthan10203040and50ofthemaximumdistanceofthestudyareawas025Similarlypriorsoverthevarianceofthespatialfieldweresetsothattheprobabilityofhavingavariancehigherthan2345and6was01

05 10 15

00

05

10

15

20

25

Matern range

x

y

PosteriorPriorReal

PosteriorPriorReal

15 10 5

000

005

010

015

020

025

030

Matern variance

x

y

(a) (b)

emspensp emsp | emsp661PENNINO Et al

for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies

Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)

Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults

Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies

AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)

This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing

F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations

minus04

minus03

minus02

minus01

00

01

02

03

04

05

minus3

minus2

minus1

0

1

2

3

4

5

6

(a) (b)

F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations

25

30

35

40

45

minus4

minus3

minus2

minus1

0

1

2

3

4

5

(a) (b)

662emsp |emsp emspensp PENNINO Et al

parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence

Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges

ACKNOWLEDG MENTS

D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare

CONFLIC T OF INTERE S T

NoneDeclared

AUTHOR CONTRIBUTIONS

AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata

DATA ACCE SSIBILIT Y

Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit

R E FE R E N C E S

BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC

Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x

CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453

ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803

DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A

Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x

Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress

DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x

Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229

FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6

FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8

GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799

GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003

Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184

Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017

IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons

KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic

emspensp emsp | emsp663PENNINO Et al

observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x

LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609

Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x

LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64

Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281

PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062

ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods

RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x

RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609

Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x

RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223

SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25

Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064

SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004

SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353

Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2

VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska

WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079

How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789

658emsp |emsp emspensp PENNINO Et al

fishery‐independentdatacollectedduringtheMEDITS(EU‐fundedMEDIterranean Trawl Survey) The MEDITS is carried out fromspringtoearlysummer(ApriltoJune)everyyearintheareaanditusesarandomsampling(Penninoetal2016)

3emsp |emspRESULTS

31emsp|emspSimulated example

Resultsobtainedinthissimulationstudyshowthatnottakingintoconsiderationpreferentialsamplingleadstomisleadingresultsspe‐ciallyatlow‐abundanceareas

Figure3showsthedifferenceinDICLCPOandMAEscoresbetweenpreferentialandnon‐preferentialmodels forbothsam‐ples the ones distributed preferentially and the one distributedrandomlyInparticularmorethan75oftheDICLCPOandMAEvaluesobtainedinthe100simulatedGaussianfieldsusingprefer‐entialmodelswere lowerthantheonesobtainedwithnon‐pref‐erentialmodels

In addition as it can be appreciated in the example shown inFigure4whichpresentstheresultsofoneoftheonehundredsim‐ulationsforexplanatorypurposesevenifnoneofthemodelswas

able tomake optimal predictions at low abundance locations thenon‐preferentialmodelperformedsignificantlyworse

Similarly Figure5which shows the posterior predictivemeanofoneofthesimulatedabundanceprocesseswithoutandwiththepreferentialsamplingcorrection((a)and(b)respectively))illustratesthatalthoughbothmodelshavesimilarpredictivespatialpatternsthepreferentiallycorrectedmodelpredictsbetteratmoderate‐to‐lowabundanceareas

FinallyFigure6showshowalldifferentpriorsofthehyperpa‐rameters of the spatial field end up fitting posterior distributionsthatareallconcentratedwithintherealvaluesResultsclearlyindi‐catethatslightchangesinthepriorsarenotanissuewhendealingwiththepreferentialsamplingmodels

32emsp|emspDistribution of blue and red shrimp in the Western Mediterranean Sea

All possiblemodels derived from 6were run Among them themost relevant results are presented in Table1While analyzingthedataweobservedthatboththebathymetricandthespatialtermsof the LGCPaccounted for approximately the same infor‐mation As a consequence full models did not converge in the

F I G U R E 3 emsp Improvementofthepreferentialmodelagainstaconventionalmodelinmodelfitscores(DICLCPOandMAE)Comparisonisbasedon100preferentiallyandrandomlysampleddatasetsNotethatpositivevaluesrepresentanimprovementonmodelfitandviceversa

minus40

minus20

0

Preferential data Random data

Type of data

Impr

ovem

ent i

n ab

unda

nce

mod

elDIC

minus03

minus02

minus01

00

01

Preferential data Random data

Type of data

LCPO

minus25

minus20

minus15

minus10

minus05

00

Preferential data Random data

Type of data

MAE

emspensp emsp | emsp659PENNINO Et al

point‐patternprocesswhichrestrictedthemodelcomparison inTable1tocorrectingonlyoneoftheeffectseitherthebathymet‐ricorthespatialeffect

Thebestmodel (basedon theDICandLCPO)was theprefer‐ential one with an shared spatial effect (ieModel 5 in Table1)ThesecondmostrelevantmodelintermisDICandLCPOwasthe

F I G U R E 4 emspSimulatedabundanceagainstpredictedabundanceinthenon‐preferentialmodel(left)andinthemodelwiththepreferentialcorrection(right)foroneoftheonehundredsimulationsperformedThenon‐preferentialmodelpredictsworsethanthepreferentialmodelatlow‐abundanceareas

0 1 2 3 4 5

02

46

810

Simulated field

Non

minuspr

efer

entia

l pre

dict

ion

0 1 2 3 4 50

24

68

10

Simulated field

Pre

fere

ntia

l pre

dict

ion

F I G U R E 5 emspPosteriorpredictivemeanmapsofoneoftheonehundredsimulatedabundanceprocesseswithout(left)andwith(right)thepreferentialsamplingcorrection

20

25

30

35

40

45

50

55

20

25

30

35

40

45

50

55

60

(a) (b)

660emsp |emsp emspensp PENNINO Et al

preferential one that included in addition to the spatial effect asharedbathymetriceffect(ieModel8inTable1)

Figure7showsthemeanoftheposteriordistributionofthespa‐tialeffectinthemodelwithoutandwithpreferentialsamplingwhileFigure8illustratestheposteriorpredictivemeanoftheblueandredshrimp distribution without and with the preferential correctionBothfiguresshowasimilarpatternIndeeditisclearthatthespatialoutputsobtainedwiththepreferentialmodelbetterabsorbthevari‐abilityofthespecieshabitatprovidingamorenaturalpatternoftheblueandredshrimpdistribution

Finally formodel validation the final preferentialmodel ob‐tained a reasonably high value for Pearsons r (r=047) in thecross‐validationwiththeMEDITSdatasetwithrespecttothenon‐preferential one (r=022) It isworthmentioning thatMEDITSsareperformedonlyinspringsummerandfewsamplingsarecar‐riedoutinthestudyareaThesefindingsfurtherhighlighthowthe

correctionofthepreferentialmodelisimportanttoreflecttherealdistributionofaspecies

4emsp |emspDISCUSSION

Inthispaperwepresentedamodelingapproachthatcouldbeveryusefulformodelingthedistributionofspeciesusingopportunisticdataandacquiringin‐depthknowledgethatcouldbeessentialforthecorrectmanagementofnaturalresourcesSpatialecologyhasadirectappliedrelevancetonaturalresourcemanagementbutitalsohasabroadecologicalsignificanceAlthoughitmaybecompli‐catedtodefinetheboundariesofspecieshabitatscombinedwithanefficientmanagementthatrecognizestheimportanceofsuchareasthisrepresentsthefirststeptowardfacilitatinganeffectivespatialmanagement However as shown by our results using anon‐accurate approach could culminate inmisidentification of aspecieshabitatanduncertainpredictionsandsoininappropriatemanagementmeasureswhichcansometimesbeirreversible

Theresultsofthepracticalapplicationonblueandredshrimpincludedhereasarealworldscenarioshowthatpredictivemapssignificantly improve the prediction of the target specieswhenthemodelaccountsforpreferentialsamplingIndeeditisknownthatthesuitablebathymetricrangeoftheblueandredshrimpintheMediterraneanseaisbetween200and200meters(GuijarroMassutiacuteMorantaampDiacuteaz2008)thatisreflectedbythepreferen‐tialmodelpredictionmapIncontrastthenon‐preferentialmodelpredictionwasnotabletocapturetherealdistributionofthespe‐ciesInadditionifamanagementmeasuresuchascreationofamarineconservationareashouldbeappliedonthebasisof thenon‐preferentialmodel it isclearthat itwouldnotbeappropri‐ateThiscouldresultinextremelylargeareabeingrecommended

TA B L E 1 emspModelcomparisonfortheabundanceoftheblueandredshrimp(Aristeus antennatus)basedonDICLCPOandcomputationaltimes

Model DIC LCPO Times (s)

1 Intc+Depth +19 +005 3

2 Intc+Spatial +9 minus 24

3 Intc+Depth+Spatial +11 +001 57

4 Intc+Depth +52 +024 21

5 Intc+Spatial minus minus 171

6 Intc+Spatial+Depth +5 +002 212

7 Intc+Depth+Spatial +10 +001 2275

8 Intc+Depth+Spatial +3 +003 3470

NotesDICandLCPOscoresarepresentedasdeviationsfromthebestmodelIntcInterceptBoldtermssharedcomponents

F I G U R E 6 emspSensitivityanalysisofthepcpriordistributionsfortherangeandvarianceofasimulatedspatialfieldDashedlinesrepresentpriordistributionssolidlinesposteriordistributionsandverticallinestherealvaluesofeachofthehyperparametersofthespatialfieldrangeintheleftpanelandvarianceintherightpanelRangepriorsweresetsothattheprobabilityofhavingarangesmallerthan10203040and50ofthemaximumdistanceofthestudyareawas025Similarlypriorsoverthevarianceofthespatialfieldweresetsothattheprobabilityofhavingavariancehigherthan2345and6was01

05 10 15

00

05

10

15

20

25

Matern range

x

y

PosteriorPriorReal

PosteriorPriorReal

15 10 5

000

005

010

015

020

025

030

Matern variance

x

y

(a) (b)

emspensp emsp | emsp661PENNINO Et al

for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies

Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)

Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults

Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies

AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)

This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing

F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations

minus04

minus03

minus02

minus01

00

01

02

03

04

05

minus3

minus2

minus1

0

1

2

3

4

5

6

(a) (b)

F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations

25

30

35

40

45

minus4

minus3

minus2

minus1

0

1

2

3

4

5

(a) (b)

662emsp |emsp emspensp PENNINO Et al

parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence

Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges

ACKNOWLEDG MENTS

D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare

CONFLIC T OF INTERE S T

NoneDeclared

AUTHOR CONTRIBUTIONS

AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata

DATA ACCE SSIBILIT Y

Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit

R E FE R E N C E S

BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC

Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x

CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453

ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803

DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A

Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x

Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress

DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x

Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229

FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6

FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8

GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799

GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003

Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184

Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017

IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons

KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic

emspensp emsp | emsp663PENNINO Et al

observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x

LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609

Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x

LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64

Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281

PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062

ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods

RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x

RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609

Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x

RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223

SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25

Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064

SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004

SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353

Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2

VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska

WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079

How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789

emspensp emsp | emsp659PENNINO Et al

point‐patternprocesswhichrestrictedthemodelcomparison inTable1tocorrectingonlyoneoftheeffectseitherthebathymet‐ricorthespatialeffect

Thebestmodel (basedon theDICandLCPO)was theprefer‐ential one with an shared spatial effect (ieModel 5 in Table1)ThesecondmostrelevantmodelintermisDICandLCPOwasthe

F I G U R E 4 emspSimulatedabundanceagainstpredictedabundanceinthenon‐preferentialmodel(left)andinthemodelwiththepreferentialcorrection(right)foroneoftheonehundredsimulationsperformedThenon‐preferentialmodelpredictsworsethanthepreferentialmodelatlow‐abundanceareas

0 1 2 3 4 5

02

46

810

Simulated field

Non

minuspr

efer

entia

l pre

dict

ion

0 1 2 3 4 50

24

68

10

Simulated field

Pre

fere

ntia

l pre

dict

ion

F I G U R E 5 emspPosteriorpredictivemeanmapsofoneoftheonehundredsimulatedabundanceprocesseswithout(left)andwith(right)thepreferentialsamplingcorrection

20

25

30

35

40

45

50

55

20

25

30

35

40

45

50

55

60

(a) (b)

660emsp |emsp emspensp PENNINO Et al

preferential one that included in addition to the spatial effect asharedbathymetriceffect(ieModel8inTable1)

Figure7showsthemeanoftheposteriordistributionofthespa‐tialeffectinthemodelwithoutandwithpreferentialsamplingwhileFigure8illustratestheposteriorpredictivemeanoftheblueandredshrimp distribution without and with the preferential correctionBothfiguresshowasimilarpatternIndeeditisclearthatthespatialoutputsobtainedwiththepreferentialmodelbetterabsorbthevari‐abilityofthespecieshabitatprovidingamorenaturalpatternoftheblueandredshrimpdistribution

Finally formodel validation the final preferentialmodel ob‐tained a reasonably high value for Pearsons r (r=047) in thecross‐validationwiththeMEDITSdatasetwithrespecttothenon‐preferential one (r=022) It isworthmentioning thatMEDITSsareperformedonlyinspringsummerandfewsamplingsarecar‐riedoutinthestudyareaThesefindingsfurtherhighlighthowthe

correctionofthepreferentialmodelisimportanttoreflecttherealdistributionofaspecies

4emsp |emspDISCUSSION

Inthispaperwepresentedamodelingapproachthatcouldbeveryusefulformodelingthedistributionofspeciesusingopportunisticdataandacquiringin‐depthknowledgethatcouldbeessentialforthecorrectmanagementofnaturalresourcesSpatialecologyhasadirectappliedrelevancetonaturalresourcemanagementbutitalsohasabroadecologicalsignificanceAlthoughitmaybecompli‐catedtodefinetheboundariesofspecieshabitatscombinedwithanefficientmanagementthatrecognizestheimportanceofsuchareasthisrepresentsthefirststeptowardfacilitatinganeffectivespatialmanagement However as shown by our results using anon‐accurate approach could culminate inmisidentification of aspecieshabitatanduncertainpredictionsandsoininappropriatemanagementmeasureswhichcansometimesbeirreversible

Theresultsofthepracticalapplicationonblueandredshrimpincludedhereasarealworldscenarioshowthatpredictivemapssignificantly improve the prediction of the target specieswhenthemodelaccountsforpreferentialsamplingIndeeditisknownthatthesuitablebathymetricrangeoftheblueandredshrimpintheMediterraneanseaisbetween200and200meters(GuijarroMassutiacuteMorantaampDiacuteaz2008)thatisreflectedbythepreferen‐tialmodelpredictionmapIncontrastthenon‐preferentialmodelpredictionwasnotabletocapturetherealdistributionofthespe‐ciesInadditionifamanagementmeasuresuchascreationofamarineconservationareashouldbeappliedonthebasisof thenon‐preferentialmodel it isclearthat itwouldnotbeappropri‐ateThiscouldresultinextremelylargeareabeingrecommended

TA B L E 1 emspModelcomparisonfortheabundanceoftheblueandredshrimp(Aristeus antennatus)basedonDICLCPOandcomputationaltimes

Model DIC LCPO Times (s)

1 Intc+Depth +19 +005 3

2 Intc+Spatial +9 minus 24

3 Intc+Depth+Spatial +11 +001 57

4 Intc+Depth +52 +024 21

5 Intc+Spatial minus minus 171

6 Intc+Spatial+Depth +5 +002 212

7 Intc+Depth+Spatial +10 +001 2275

8 Intc+Depth+Spatial +3 +003 3470

NotesDICandLCPOscoresarepresentedasdeviationsfromthebestmodelIntcInterceptBoldtermssharedcomponents

F I G U R E 6 emspSensitivityanalysisofthepcpriordistributionsfortherangeandvarianceofasimulatedspatialfieldDashedlinesrepresentpriordistributionssolidlinesposteriordistributionsandverticallinestherealvaluesofeachofthehyperparametersofthespatialfieldrangeintheleftpanelandvarianceintherightpanelRangepriorsweresetsothattheprobabilityofhavingarangesmallerthan10203040and50ofthemaximumdistanceofthestudyareawas025Similarlypriorsoverthevarianceofthespatialfieldweresetsothattheprobabilityofhavingavariancehigherthan2345and6was01

05 10 15

00

05

10

15

20

25

Matern range

x

y

PosteriorPriorReal

PosteriorPriorReal

15 10 5

000

005

010

015

020

025

030

Matern variance

x

y

(a) (b)

emspensp emsp | emsp661PENNINO Et al

for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies

Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)

Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults

Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies

AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)

This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing

F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations

minus04

minus03

minus02

minus01

00

01

02

03

04

05

minus3

minus2

minus1

0

1

2

3

4

5

6

(a) (b)

F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations

25

30

35

40

45

minus4

minus3

minus2

minus1

0

1

2

3

4

5

(a) (b)

662emsp |emsp emspensp PENNINO Et al

parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence

Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges

ACKNOWLEDG MENTS

D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare

CONFLIC T OF INTERE S T

NoneDeclared

AUTHOR CONTRIBUTIONS

AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata

DATA ACCE SSIBILIT Y

Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit

R E FE R E N C E S

BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC

Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x

CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453

ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803

DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A

Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x

Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress

DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x

Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229

FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6

FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8

GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799

GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003

Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184

Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017

IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons

KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic

emspensp emsp | emsp663PENNINO Et al

observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x

LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609

Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x

LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64

Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281

PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062

ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods

RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x

RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609

Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x

RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223

SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25

Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064

SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004

SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353

Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2

VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska

WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079

How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789

660emsp |emsp emspensp PENNINO Et al

preferential one that included in addition to the spatial effect asharedbathymetriceffect(ieModel8inTable1)

Figure7showsthemeanoftheposteriordistributionofthespa‐tialeffectinthemodelwithoutandwithpreferentialsamplingwhileFigure8illustratestheposteriorpredictivemeanoftheblueandredshrimp distribution without and with the preferential correctionBothfiguresshowasimilarpatternIndeeditisclearthatthespatialoutputsobtainedwiththepreferentialmodelbetterabsorbthevari‐abilityofthespecieshabitatprovidingamorenaturalpatternoftheblueandredshrimpdistribution

Finally formodel validation the final preferentialmodel ob‐tained a reasonably high value for Pearsons r (r=047) in thecross‐validationwiththeMEDITSdatasetwithrespecttothenon‐preferential one (r=022) It isworthmentioning thatMEDITSsareperformedonlyinspringsummerandfewsamplingsarecar‐riedoutinthestudyareaThesefindingsfurtherhighlighthowthe

correctionofthepreferentialmodelisimportanttoreflecttherealdistributionofaspecies

4emsp |emspDISCUSSION

Inthispaperwepresentedamodelingapproachthatcouldbeveryusefulformodelingthedistributionofspeciesusingopportunisticdataandacquiringin‐depthknowledgethatcouldbeessentialforthecorrectmanagementofnaturalresourcesSpatialecologyhasadirectappliedrelevancetonaturalresourcemanagementbutitalsohasabroadecologicalsignificanceAlthoughitmaybecompli‐catedtodefinetheboundariesofspecieshabitatscombinedwithanefficientmanagementthatrecognizestheimportanceofsuchareasthisrepresentsthefirststeptowardfacilitatinganeffectivespatialmanagement However as shown by our results using anon‐accurate approach could culminate inmisidentification of aspecieshabitatanduncertainpredictionsandsoininappropriatemanagementmeasureswhichcansometimesbeirreversible

Theresultsofthepracticalapplicationonblueandredshrimpincludedhereasarealworldscenarioshowthatpredictivemapssignificantly improve the prediction of the target specieswhenthemodelaccountsforpreferentialsamplingIndeeditisknownthatthesuitablebathymetricrangeoftheblueandredshrimpintheMediterraneanseaisbetween200and200meters(GuijarroMassutiacuteMorantaampDiacuteaz2008)thatisreflectedbythepreferen‐tialmodelpredictionmapIncontrastthenon‐preferentialmodelpredictionwasnotabletocapturetherealdistributionofthespe‐ciesInadditionifamanagementmeasuresuchascreationofamarineconservationareashouldbeappliedonthebasisof thenon‐preferentialmodel it isclearthat itwouldnotbeappropri‐ateThiscouldresultinextremelylargeareabeingrecommended

TA B L E 1 emspModelcomparisonfortheabundanceoftheblueandredshrimp(Aristeus antennatus)basedonDICLCPOandcomputationaltimes

Model DIC LCPO Times (s)

1 Intc+Depth +19 +005 3

2 Intc+Spatial +9 minus 24

3 Intc+Depth+Spatial +11 +001 57

4 Intc+Depth +52 +024 21

5 Intc+Spatial minus minus 171

6 Intc+Spatial+Depth +5 +002 212

7 Intc+Depth+Spatial +10 +001 2275

8 Intc+Depth+Spatial +3 +003 3470

NotesDICandLCPOscoresarepresentedasdeviationsfromthebestmodelIntcInterceptBoldtermssharedcomponents

F I G U R E 6 emspSensitivityanalysisofthepcpriordistributionsfortherangeandvarianceofasimulatedspatialfieldDashedlinesrepresentpriordistributionssolidlinesposteriordistributionsandverticallinestherealvaluesofeachofthehyperparametersofthespatialfieldrangeintheleftpanelandvarianceintherightpanelRangepriorsweresetsothattheprobabilityofhavingarangesmallerthan10203040and50ofthemaximumdistanceofthestudyareawas025Similarlypriorsoverthevarianceofthespatialfieldweresetsothattheprobabilityofhavingavariancehigherthan2345and6was01

05 10 15

00

05

10

15

20

25

Matern range

x

y

PosteriorPriorReal

PosteriorPriorReal

15 10 5

000

005

010

015

020

025

030

Matern variance

x

y

(a) (b)

emspensp emsp | emsp661PENNINO Et al

for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies

Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)

Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults

Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies

AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)

This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing

F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations

minus04

minus03

minus02

minus01

00

01

02

03

04

05

minus3

minus2

minus1

0

1

2

3

4

5

6

(a) (b)

F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations

25

30

35

40

45

minus4

minus3

minus2

minus1

0

1

2

3

4

5

(a) (b)

662emsp |emsp emspensp PENNINO Et al

parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence

Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges

ACKNOWLEDG MENTS

D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare

CONFLIC T OF INTERE S T

NoneDeclared

AUTHOR CONTRIBUTIONS

AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata

DATA ACCE SSIBILIT Y

Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit

R E FE R E N C E S

BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC

Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x

CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453

ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803

DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A

Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x

Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress

DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x

Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229

FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6

FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8

GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799

GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003

Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184

Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017

IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons

KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic

emspensp emsp | emsp663PENNINO Et al

observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x

LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609

Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x

LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64

Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281

PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062

ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods

RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x

RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609

Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x

RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223

SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25

Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064

SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004

SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353

Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2

VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska

WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079

How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789

emspensp emsp | emsp661PENNINO Et al

for protection which is usually difficult to implement in mostcontexts especially given the social and economic relevanceoffishing (ReidAlmeidaampZetlin 1999)Moreover the results ofthecross‐validationwithanexternalindependentdatasetfurtherhighlighted how the correction of the preferentialmodel is im‐portanttoreflecttherealdistributionofaspecies

Neverthelessevenifthepreferentialmodelimprovesthees‐timationof thebathymetric effect newobservations atdeeperwaterscouldfurtherimprovethisrelationshipandbetterunder‐stand the blue and red shrimp distribution in this area (GorelliSardagraveampCompany2016)

Similarly thesimulatedexampleshowed thatnot taking intoac‐countthepreferentialsamplingmodelcouldleadtomisleadingresults

Consequently we conclude that this approach could supposeamajorstepforwardintheunderstandingoftargetfishedspeciesmesoscaleecologygiventhatmostoftheavailabledatatodayarefishery‐dependentdataInadditionusinganon‐preferentialmodelwithopportunisticdata inaSDMcontext isnotcorrectas spatialSDMsassumethattheselectionofthesamplinglocationsdoesnotdependonthevaluesoftheobservedspecies

AnotheradvantageisundoubtedlytheuseofINLAinthiscon‐textwhichmightbeakeygeostatisticaltoolduetoitsnotableflex‐ibility in fitting complex models and its computational efficiency(Paradinasetal2015)

This modeling could be expanded to the spatiotemporal do‐mainbyincorporatinganextratermforthetemporaleffectusing

F I G U R E 7 emspMapsofthemeanoftheposteriordistributionofthespatialeffectinthemodelwithout(left)andwith(right)preferentialsamplingBlackdotsrepresentsamplinglocations

minus04

minus03

minus02

minus01

00

01

02

03

04

05

minus3

minus2

minus1

0

1

2

3

4

5

6

(a) (b)

F I G U R E 8 emspPosteriorpredictivemeanmapsoftheblueandredshrimp(Aristeus antennatus)specieswithoutandwiththepreferentialsamplingcorrectionBlackdotsrepresentsamplinglocations

25

30

35

40

45

minus4

minus3

minus2

minus1

0

1

2

3

4

5

(a) (b)

662emsp |emsp emspensp PENNINO Et al

parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence

Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges

ACKNOWLEDG MENTS

D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare

CONFLIC T OF INTERE S T

NoneDeclared

AUTHOR CONTRIBUTIONS

AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata

DATA ACCE SSIBILIT Y

Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit

R E FE R E N C E S

BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC

Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x

CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453

ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803

DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A

Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x

Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress

DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x

Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229

FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6

FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8

GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799

GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003

Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184

Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017

IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons

KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic

emspensp emsp | emsp663PENNINO Et al

observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x

LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609

Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x

LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64

Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281

PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062

ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods

RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x

RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609

Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x

RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223

SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25

Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064

SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004

SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353

Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2

VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska

WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079

How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789

662emsp |emsp emspensp PENNINO Et al

parametric or semiparametric constructions to reflect linearnon‐linear autoregressive ormore complex behaviors that couldbeveryimportanttodescribethedistributionofaparticularspe‐cies Moreover although we presented a case study related toabundancedatathisapproachcouldbealsoextendedforspeciespresencendashabsence data that are more common when SDMs areperformed In particular using occurrences themodeling frame‐workwillbethesameas theonedescribed inourstudybut theGaussian fieldwill be an approximation of the probability of thespeciespresence

Finally it is worth to be noting that as shown by HowardStephens Pearce‐Higgins Gregory and Willis (2014) even usingcoarse‐scale abundancedata large improvements in theability topredictspeciesdistributionscanbeachievedovertheirpresencendashabsencemodelcounterpartsConsequentlywhereavailableitwillbebettertouseabundancedataratherthanpresencendashabsencedatainordertomoreaccuratelypredicttheecologicalconsequencesofenvironmentalchanges

ACKNOWLEDG MENTS

D C A L Q and F M would like to thank the Ministerio deEducacioacuten y Ciencia (Spain) for financial support (jointly financedby the European Regional Development Fund) via ResearchGrants MTM2013‐42323‐P and MTM2016‐77501‐P andACOMP2015202fromGeneralitatValenciana(Spain)Theauthorsare very grateful to all the observers shipowners and crewswhohavecontributedtothedatacollectionoffisherydataandMEDITSsFishery data collection was cofunded by the EU through theEuropeanMaritimeandFisheriesFund(EMFF)withintheNationalProgrammeofcollectionmanagementanduseofdatainthefisher‐iessectorandsupportforscientificadviceregardingtheCommonFisheriesPolicyTheauthorshavenoconflictofintereststodeclare

CONFLIC T OF INTERE S T

NoneDeclared

AUTHOR CONTRIBUTIONS

AllauthorsconceivedanddesignedtheworkdraftedtheworkandgavefinalapprovaloftheversiontobepublishedMGPIPFMandJIanalyzedthedata

DATA ACCE SSIBILIT Y

Dataunderlyingthisarticleareavailableonrequestpleasecontactgraziapenninoyahooit

R E FE R E N C E S

BanerjeeSCarlinBPampGelfandAE(2004)Hierarchical modeling and analysis for spatial dataNewYorkChapmanandHallCRC

Brotons LHerrando SampPlaM (2007)Updating bird species dis‐tributionatlargespatialscalesApplicationsofhabitatmodellingtodatafromlong‐termmonitoringprogramsDiversity and Distributions13(3)276ndash288httpsdoiorg101111j1472‐4642200700339x

CarbonellA Llompart P JGazaMMirAAparicio‐GonzaacutelezAAacutelvarez‐BarasteguiDhellipCartesJE(2017)Long‐termclimaticin‐fluencesonthephysiologicalconditionoftheredshrimpAristeus an-tennatusinthewesternMediterraneanSeaClimate Research72(2)111ndash127httpsdoiorg103354cr01453

ConnPBThorsonJTampJohnsonDS(2017)Confrontingpreferen‐tialsamplingwhenanalysingpopulationdistributionsDiagnosisandmodel‐based triageMethods in Ecology and Evolution8(11)1535ndash1546httpsdoiorg1011112041‐210X12803

DevalMCampKapirisK(2016)Areviewofbiologicalpatternsoftheblue‐red shrimp Aristeus antennatus in the Mediterranean Sea AcasestudyofthepopulationofAntalyaBayeasternMediterraneansea Scientia Marina 80(3) 339ndash348 httpsdoiorg103989scimar0441122A

Diggle P J Menezes R amp Su T‐I (2010) Geostatistical infer‐ence under preferential sampling Journal of the Royal Statistical Society Series C (Applied Statistics) 59(2) 191ndash232 httpsdoiorg101111j1467‐9876200900701x

Dinsdale D amp Salibian‐Barrera M (2018) Methods for preferentialsamplingingeostatisticsJournal of the Royal Statistical Society Series C (Applied Statistics)Inpress

DormannCFMcPhersonJMArauacutejoMBBivandRBolligerJCarlGhellipWilsonR(2007)Methodstoaccountforspatialautocorrela‐tionintheanalysisofspeciesdistributionaldataAreviewEcography30(5)609ndash628httpsdoiorg101111j20070906‐759005171x

Fahrmeir LampLang S (2001)Bayesian inference for generalizedad‐ditivemixedmodels basedonmarkov random field priors Journal of the Royal Statistical Society Series C (Applied Statistics)50(2)201ndash220httpsdoiorg1011111467‐987600229

FahrmeirLampTutzG (2001)Models formulticategorical responsesMultivariateextensionsofgeneralizedlinearmodelsInMultivariate statistical modelling based on generalized linear models (pp 69ndash137)NewYorkSpringerhttpsdoiorg101007978‐1‐4757‐3454‐6

FuglstadGASimpsonDLindgrenFampRueH(2017)ConstructingpriorsthatpenalizethecomplexityofgaussianrandomfieldsJournal of the American Statistical Association1ndash8

GorelliGSardagraveFampCompanyJB(2016)Fishingeffortincreaseandresourcestatusofthedeep‐searedshrimpAristeus antennatus(Risso1816)inthenorthwestMediterraneanSeasincethe1950sReviews in Fisheries Science amp Aquaculture24(2)192ndash202httpsdoiorg1010802330824920151119799

GuijarroBMassutiacuteEMoranta JampDiacuteazP (2008)Populationdy‐namicsoftheredshrimpAristeus antennatus intheBalearicislands(WesternMediterranean)ShortSpatio‐temporaldifferencesandin‐fluenceofenvironmentalfactorsJournal of Marine Systems71(3ndash4)385ndash402httpsdoiorg101016jjmarsys200704003

Howard C Stephens P A Pearce‐Higgins JW Gregory R D ampWillisSG(2014)ImprovingspeciesdistributionmodelsThevalueofdataonabundanceMethods in Ecology and Evolution5(6)506ndash513httpsdoiorg1011112041‐210X12184

Illian J B Martino S Soslashrbye S H Gallego‐Fernaaacutendez J BZunzunegui M Esquivias M P amp Travis J M J (2013) Fittingcomplexecologicalpointprocessmodelswithintegratednestedla‐placeapproximationMethods in Ecology and Evolution4(4)305ndash315httpsdoiorg1011112041‐210x12017

IllianJPenttinenAStoyanHampStoyanD(2008)Statistical analysis and modelling of spatial point patternsVolume70HobokenNJJohnWileyampSons

KeryM Royle J A SchmidH SchaubM Volet BHaefligerGamp Zbinden N (2010) Site‐occupancy distribution modeling tocorrect population‐trend estimates derived from opportunistic

emspensp emsp | emsp663PENNINO Et al

observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x

LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609

Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x

LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64

Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281

PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062

ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods

RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x

RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609

Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x

RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223

SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25

Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064

SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004

SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353

Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2

VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska

WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079

How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789

emspensp emsp | emsp663PENNINO Et al

observations Conservation Biology 24(5) 1388ndash1397 httpsdoiorg101111j1523‐1739201001479x

LatimerAMWuSGelfandAEampSilanderJAJr(2006)BuildingstatisticalmodelstoanalyzespeciesdistributionsEcological applica-tions16(1)33ndash50httpsdoiorg10189004‐0609

Lindgren F Rue H amp Lindstroumlm J (2011) An explicit link betweengaussian fields andgaussianmarkov random fieldsThe stochasticpartialdifferentialequationapproachJournal of the Royal Statistical Society Series B (Statistical Methodology)73(4)423ndash498httpsdoiorg101111j1467‐9868201100777x

LleonartJ (2005)ReviewofthestateoftheworldfisheryresourcesFAO Fish Technical Paper45749ndash64

Paradinas IConesaDPenninoMGMuntildeozFFernaacutendezAMLoacutepez‐Quiacutelez A amp Bellido J M (2015) Bayesian spatio‐tempo‐ral approach to identifying fishnurseriesby validatingpersistenceareas Marine Ecology Progress Series 528 245ndash255 httpsdoiorg103354meps11281

PenninoMGConesaDLoacutepez‐QuiacutelezAMunozFFernaacutendezAampBellidoJM(2016)Fishery‐dependentand‐independentdataleadtoconsistentestimationsofessentialhabitatsICES Journal of Marine Science73(9)2302ndash2310httpsdoiorg101093icesjmsfsw062

ReidRNAlmeidaFPampZetlinCA (1999) Essential fishhabitatsource document Fishery‐independent surveys data sources andmethods

RodriacuteguezJPBrotonsLBustamanteJampSeoaneJ(2007)Theap‐plicationofpredictivemodellingofspeciesdistributiontobiodiver‐sityconservationDiversity and Distributions13(3)243ndash251httpsdoiorg101111j1472‐4642200700356x

RoosMampHeldL(2011)SensitivityanalysisinbayesiangeneralizedlinearmixedmodelsforbinarydataBayesian Analysis6(2)259ndash278httpsdoiorg10121411‐BA609

Rue H Martino S amp Chopin N (2009) Approximate bayesian in‐ference for latent gaussian models by using integrated nestedlaplace approximations Journal of the Royal Statistical Society Series B (Statistical Methodology) 71(2) 319ndash392 httpsdoiorg101111j1467‐9868200800700x

RueHMartinoSMondalDampChopinN(2010)DiscussiononthepaperGeostatisticalinferenceunderpreferentialsamplingbyDiggleMenezesandSuJournal of the Royal Statistical Society Series C52221ndash223

SchlatherMMalinowskiAMenckPJOestingMampStrokorbK(2015) Analysis simulation and prediction ofmultivariate randomfields with package RandomFields Journal of Statistical Software63(8)1ndash25

Simpson D Illian J B Lindgren F Soslashrbye S H amp Rue H (2016)GoingoffgridComputationallyefficientinferenceforlog‐gaussiancox processes Biometrika 103(1) 49ndash70 httpsdoiorg101093biometasv064

SoslashrbyeSHampRueH(2014)ScalingintrinsicGaussianMarkovrandomfieldpriors inspatialmodellingSpatial Statistics839ndash51httpsdoiorg101016jspasta201306004

SpiegelhalterDJBestNGCarlinmBPampVanDerLindeA(2002)BayesianmeasuresofmodelcomplexityandfitJournal of the Royal Statistical Society Series B (Statistical Methodology)64(4)583ndash639httpsdoiorg1011111467‐986800353

Thogmartin W E Knutson M G amp Sauer J R (2006) Predictingregional abundance of rare grassland birds with a hierarchi‐cal spatial count model Condor 108(1) 25ndash46 httpsdoiorg1016500010‐5422(2006)108[0025PRAORG]20CO2

VasconcellosMampCochraneK(2005)Overviewofworldstatusofda‐ta‐limitedfisheriesInferencesfromlandingsstatisticsInGHKruseVFGallucciDEHayRIPerryRMPetermanTCShirleyPDSpencerBWilsonampDWoodby(Eds)Fisheries Assessment and Management in Data-Limited Situations (pp1ndash20)FairbanksAlaskaSeaGrantCollegeProgramUniversityofAlaska

WillmottC JampMatsuuraK (2005)Advantagesof themeanabso‐luteerror(MAE)overtherootmeansquareerror(RMSE)inassessingaveragemodelperformanceClimate Research30(1)79ndash82httpsdoiorg103354cr030079

How to cite this articlePenninoMGParadinasIIllianJBetalAccountingforpreferentialsamplinginspeciesdistributionmodelsEcol Evol 20199653ndash663 httpsdoiorg101002ece34789