Occupancy detection Function — occDetFunc • sparta

Run occupancy detection models using the output from formatOccData

Usage

occDetFunc(
  taxa_name,
  occDetdata,
  spp_vis,
  n_iterations = 5000,
  nyr = 2,
  burnin = 1500,
  thinning = 3,
  n_chains = 3,
  write_results = TRUE,
  output_dir = getwd(),
  modeltype = "sparta",
  max_year = NULL,
  seed = NULL,
  model.function = NULL,
  regional_codes = NULL,
  region_aggs = NULL,
  additional.parameters = NULL,
  additional.BUGS.elements = NULL,
  additional.init.values = NULL,
  return_data = FALSE,
  criterion = 1,
  provenance = NULL,
  saveMatrix = FALSE,
  rem_aggs_with_missing_regions = FALSE,
  allowSitesMultiRegions = FALSE
)

Arguments

taxa_name: A character giving the name of the species to be modelled.
occDetdata: The 2nd element of the object returned by formatOccData.
spp_vis: The 1st element of the object returned by formatOccData.
n_iterations: numeric, An MCMC parameter - The number of interations
nyr: numeric, the minimum number of periods on which a site must have records for it to be included in the models. Defaults to 2
burnin: numeric, An MCMC parameter - The length of the burn in
thinning: numeric, An MCMC parameter - The thinning factor
n_chains: numeric, an MCMC parameter - The number of chains to be run
write_results: logical, should results be saved to output_dir. This is recommended since these models can take a long time to run. If TRUE (default) the results from each species will be saved as an .rdata file once the species has run. This prevents loss of data should anything go wrong.
output_dir: character, the output directory were the output for each taxa will be saved as .rdata files. This will defualt to the working directory
modeltype: A character string or vector of strings that specifies the model to use. See details. If used then model.function is ignored.
max_year: numeric, final year to which analysis will be run, this can be set if it is beyond the limit of the dataset. Defaults to final year of the dataset.
seed: numeric, uses set.seed to set the randon number seed. Setting this number ensures repeatabl analyses
model.function: optionally a user defined BUGS model coded as a function (see ?jags, including the example there, for how this is done)
regional_codes: A data.frame object detailing which site is associated with which region. each row desginates a site and each column represents a region. The first column represents the site name (as in site). Subsequent columns are named for each regions with 1 representing the site is in that region and 0 that it is not. NOTE a site should only be in one region
region_aggs: A named list giving aggregations of regions that you want trend estimates for. For example region_aggs = list(GB = c('england', 'scotland', 'wales')) will produced a trend for GB (Great Britain) as well as its constituent nations. Note that 'england', scotland' and 'wales' must appear as names of columns in regional_codes. More than one aggregate can be given, eg region_aggs = list(GB = c('england', 'scotland', 'wales'), UK = c('england', 'scotland', 'wales', 'northern_ireland')).
additional.parameters: A character vector of additional parameters to monitor
additional.BUGS.elements: A named list giving additioanl bugs elements passed to R2jags::jags 'data' argument
additional.init.values: A named list giving user specified initial values to be added to the defaults.
return_data: Logical, if TRUE (default) the BUGS data object is returned with the data
criterion: Determines whether the model should be run. If an integer then this defines the threshold number of records (50 in Outhwaite et al 2019). Other options are `EqualWt` or `HighSpec`, which define the application of "rules of thumb" defined in Pocock et al 2019. Defaults to 1, in which case the model is applied for so long there is a single record of the focal species.
provenance: An optional text string allowing the user to identify the dataset.
saveMatrix: Logical, if FALSE (default) the sims.matrix element of the jags object is omitted, in order to reduce the filesize.
rem_aggs_with_missing_regions: An option which if TRUE will remove all aggregates which contain at least one region with no data. If `FALSE`, only aggregates where ALL regions in that aggregate contain no data, are dropped. Defaults to FALSE
allowSitesMultiRegions: An option that permits sites to be included in more than one region if `TRUE`. If `FALSE` then these sites are dropped. Defaults to `FALSE`

Value

A list including the model, JAGS model output, the path of the model file used and information on the number of iterations, first year, last year, etc. Key aspects of the model output include:

"out$model" - The model used as provided to JAGS. Also contained is a list of fully observed variables. These are those listed in the BUGS data.
"out$BUGSoutput$n.chains" - The number of Markov chains ran in the MCMC simulations.
"out$BUGSoutput$n.iter" - The total number of iterations per chain.
"out$BUGSoutput$n.burnin" - The number of interations discarded from the start as a burn-in period.
"out$BUGSoutput$n.thin" - The thinning rate used. For example a thinning rate of 3 retains only every third iteration. This is used to reduce autocorrelation.
"out$BUGSoutput$n.keep" - The number of iterations kept per chain. This is the total number of iterations minus the burn-in then divided by the thinning rate.
"out$BUGSoutput$n.sims" - The total number of iterations kept.
"out$BUGSoutput$summary" - A summary table of the monitored parameter. The posterior distribution for each parameter is summaried with the mean, standard deviation, various credible intervals, a formal convergence metric (Rhat), and a measure of effective sample size (n.eff).
"out$BUGSoutput$mean" - the mean values for all monitored parameters
"out$BUGSoutput$sd" - the standard deviation values for all monitored parameters
"out$BUGSoutput$median" - the median values for all monitored parameters
"out$parameters.to.save" - The names of all monitored parameters.
"out$BUGSoutput$model.file" - The user provided or temporary generated model file detailing the occupancy model.
"out$n.iter" - The total number of interations per chain.
"out$DIC" - Whether the Deviance Information Criterion (DIC) is calculated.
"out$BUGSoutput$sims.list" - A list of the posterior distribution for each monitored parameter. Use sims.array and sims.matrix if a different format of the posteriors is desired.
"out$SPP_NAME" - The name of the study species.
"out$min_year" - First year of data included in the occupancy model run.
"out$max_year" - Final year of data included in the occupancy model run or final year specified by the user.
"out$sites_included" - List of the sites taken forward to the model (after filtering)
"out$nsites" - The number of unique sites included in the occupancy model run.
"out$nvisits" - The number of unique visits included int he occupancy model run.
"out$species_observations" - The number of unique records for the species of interest.
"out$sparta_version" - The version of sparta used to run the model.
"out$regions" - The names of the regions included in the model run.
"out$region_aggs" - The names of the region aggregates included in the model run. See also the `metadata` attribute for more detailed information about the data in the model.

Details

This function requires the program JAGS. This is not installed by default when sparta is loaded and should be installed by the user. More details can be found in the vignette.

modeltype is used to choose the model as well as the associated initial values, and parameters to monitor. Elements to choose from can be separated into the following components:

A. Prior type: this has 3 options, each of which was tested in Outhwaite et al (in review): 1. sparta - This uses the same model as in Isaac et al (2014). 2. indran - This is the adaptive stationary model. 3. ranwalk - This is the random walk model.

B. Hyperprior type: This has 3 options, each of these are discussed in Outhwaite et al (in review): 1. halfuniform - the original formulation in Isaac et al (2014). 2. halfcauchy - preferred form, tested in Outhwaite et al (2018). 3. inversegamma - alternative form presented in the literature.

C. List length specification: This has 3 options: 1. catlistlength - list length as a categorical variable. 2. contlistlength - list length as a continuous variable. 3. nolistlength - no list length variable.

D. Julian date: this is an additional option for including Julian date within the detection model: 1. jul_date.

Not all combinations are available in sparta. You will get an error if you try and use a combination that is not supported. There is usually a good reason why that combination is not a good idea. Here are the model elements available:

"sparta" - This uses the same model as in Isaac et al (2014)
"indran" - Here the prior for the year effect of the state model is modelled as a random effect. This allows the model to adapt to interannual variability.
"ranwalk" - Here the prior for the year effect of the state model is modelled as a random walk. Each estimate for the year effect is dependent on that of the previous year.
"halfcauchy" - Includes half-Cauchy hyperpriors for all random effects within the model. The half-Cauchy is a special case of the Student’s t distribution with 1 degree of freedom.
"inversegamma" - Includes inverse-gamma hyperpriors for random effects within the model
"catlistlength" - This specifies that list length should be considered as a catagorical variable. There are 3 classes, lists of length 1, 2-3, and 4 and over. If none of the list length options are specifed 'contlistlength' is used
"contlistlength" - This specifies that list length should be considered as a continious variable. If none of the list length options are specifed 'contlistlength' is used
"nolistlength" - This specifies that no list length should be used. If none of the list length options are specifed 'contlistlength' is used
"jul_date" - This adds Julian date to the model as a normal distribution with its mean and standard deviation as monitered parameters.
"intercept" - No longer available. Includes an intercept term in the state and observation model. By including intercept terms, the occupancy and detection probabilities in each year are centred on an overall mean level.
"centering" - No longer available. Includes hierarchical centering of the model parameters. Centring does not change the model explicitly but writes it in a way that allows parameter estimates to be updated simultaneously.

These options are provided as a vector of characters, e.g. modeltype = c('indran', 'halfcauchy', 'catlistlength')

References

Isaac, N.J.B., van Strien, A.J., August, T.A., de Zeeuw, M.P. and Roy, D.B. (2014). Statistics for citizen science: extracting signals of change from noisy ecological data. Methods in Ecology and Evolution, 5: 1052-1060.

Outhwaite, C.L., Chandler, R.E., Powney, G.D., Collen, B., Gregory, R.D. & Isaac, N.J.B. (2018). Prior specification in Bayesian occupancy modelling improves analysis of species occurrence data. Ecological Indicators, 93: 333-343.

Pocock, Logie, Isaac, Outhwaite & August. Rapid assessment of the suitability of multi-species citizen science datasets for occupancy trend analysis. bioRxiv 813626 (2019) doi:10.1101/813626.

Examples

if (FALSE) {

set.seed(123)

# Create data
n <- 15000 # size of dataset
nyear <- 20 # number of years in data
nSamples <- 100 # set number of dates
nSites <- 50 # set number of sites

# Create somes dates
first <- as.Date(strptime("2010/01/01", format = "%Y/%m/%d"))
last <- as.Date(strptime(paste(2010 + (nyear - 1), "/12/31", sep = ""), format = "%Y/%m/%d"))
dt <- last - first
rDates <- first + (runif(nSamples) * dt)

# taxa are set as random letters
taxa <- sample(letters, size = n, TRUE)

# sites are visited randomly
site <- sample(paste("A", 1:nSites, sep = ""), size = n, TRUE)

# the date of visit is selected at random from those created earlier
survey <- sample(rDates, size = n, TRUE)

# Format the data
visitData <- formatOccData(taxa = taxa, site = site, survey = survey)

# run the model with these data for one species (very small number of iterations)
results <- occDetFunc(
  taxa_name = taxa[1],
  n_iterations = 50,
  burnin = 15,
  occDetdata = visitData$occDetdata,
  spp_vis = visitData$spp_vis,
  write_results = FALSE,
  provenance = "sparta test dataset"
)
}