This function uses the method outlined in Roy et al (2012) and Isaac et al (2014) for selecting well-sampled sites from a dataset using list length and number of years as selection criteria.
Arguments
- taxa
A character vector of taxon names, as long as the number of observations.
- site
A character vector of site names, as long as the number of observations.
- time_period
A numeric vector of user defined time periods, or a date vector, as long as the number of observations.
- minL
numeric, The minimum number of taxa recorded at a site at a given time period (list-length) for the visit to be considered well sampled.
- minTP
numeric, The minimum number of time periods, or if time_period is a date the minimum number of years, a site must be sampled in for it be be considered well sampled.
- LFirst
Logical, if
TRUE
data is first filtered by list-length then time periods, else time period then list-length
Examples
# Create data
n <- 150 #size of dataset
nyr <- 8 # number of years in data
nSamples <- 20 # set number of dates
# Create somes dates
first <- as.POSIXct(strptime("2003/01/01", "%Y/%m/%d"))
last <- as.POSIXct(strptime(paste(2003+(nyr-1),"/12/31", sep=''), "%Y/%m/%d"))
dt <- last-first
rDates <- first + (runif(nSamples)*dt)
# taxa are set as random letters
taxa <- sample(letters, size = n, TRUE)
# three sites are visited randomly
site <- sample(c('one', 'two', 'three'), size = n, TRUE)
# the date of visit is selected at random from those created earlier
time_period <- sample(rDates, size = n, TRUE)
# combine this to a dataframe
df <- data.frame(taxa, site, time_period)
head(df)
#> taxa site time_period
#> 1 r one 2007-01-22 20:18:55
#> 2 e two 2008-05-07 19:30:53
#> 3 w three 2006-03-25 19:46:36
#> 4 k one 2004-09-30 01:53:14
#> 5 a two 2006-03-25 19:46:36
#> 6 d two 2008-05-07 19:30:53
# Use the site selection function on this simulated data
dfSEL <- siteSelection(df$taxa, df$site, df$time_period, minL = 4, minTP = 3)
#> Warning: 10 out of 150 observations will be removed as duplicates