Skip to contents

This function uses the method outlined in Roy et al (2012) and Isaac et al (2014) for selecting well-sampled sites from a dataset using list length and number of years as selection criteria.

Usage

siteSelection(taxa, site, time_period, minL, minTP, LFirst = TRUE)

Arguments

taxa

A character vector of taxon names, as long as the number of observations.

site

A character vector of site names, as long as the number of observations.

time_period

A numeric vector of user defined time periods, or a date vector, as long as the number of observations.

minL

numeric, The minimum number of taxa recorded at a site at a given time period (list-length) for the visit to be considered well sampled.

minTP

numeric, The minimum number of time periods, or if time_period is a date the minimum number of years, a site must be sampled in for it be be considered well sampled.

LFirst

Logical, if TRUE data is first filtered by list-length then time periods, else time period then list-length

Value

A data.frame of data that forefills the selection criteria

References

needed

Examples

# Create data
n <- 150 #size of dataset
nyr <- 8 # number of years in data
nSamples <- 20 # set number of dates

# Create somes dates
first <- as.POSIXct(strptime("2003/01/01", "%Y/%m/%d")) 
last <- as.POSIXct(strptime(paste(2003+(nyr-1),"/12/31", sep=''), "%Y/%m/%d")) 
dt <- last-first 
rDates <- first + (runif(nSamples)*dt)

# taxa are set as random letters
taxa <- sample(letters, size = n, TRUE)

# three sites are visited randomly
site <- sample(c('one', 'two', 'three'), size = n, TRUE)

# the date of visit is selected at random from those created earlier
time_period <- sample(rDates, size = n, TRUE)

# combine this to a dataframe
df <- data.frame(taxa, site, time_period)
head(df)
#>   taxa  site         time_period
#> 1    r   one 2007-01-22 20:18:55
#> 2    e   two 2008-05-07 19:30:53
#> 3    w three 2006-03-25 19:46:36
#> 4    k   one 2004-09-30 01:53:14
#> 5    a   two 2006-03-25 19:46:36
#> 6    d   two 2008-05-07 19:30:53

# Use the site selection function on this simulated data
dfSEL  <- siteSelection(df$taxa, df$site, df$time_period, minL = 4, minTP = 3)
#> Warning: 10 out of 150 observations will be removed as duplicates