This function takes in data for a recorder and calculates the list length metrics. These metrics are based around the idea of a 'list', defined as the species recorded at a single location (often a 1km square) on a single day by an individual recorder.

listLength(recorder_name, data, threshold = 10, plot = FALSE,
  sp_col = "preferred_taxon", date_col = "date_start",
  recorder_col = "recorders", location_col = "kmsq")

Arguments

recorder_name

the name of the recorder for whom you want to calculate the metrics

data

the data.frame of recording information

threshold

how many lists do there need to be before we calculate the metrics? If this is not met NA is reported for all metrics except n_lists

plot

should a plot of a histogram of list lengths be created

sp_col

the name of the column that contains the species names

date_col

the name of the column that contains the date. This must be formatted as a date

recorder_col

the name of the column that contains the recorder names

location_col

the name of the column that contains the location. This is a character, such as a grid reference and should be representative of the scale at which recording is done over a single day, typically 1km-square is used.

Value

A data.frame with seven columns

  • recorder - The name of the recorder, as given in the recorder_name argument

  • mean_LL - The mean number of species recorded across all lists

  • median_LL - The median number of species recorded across all lists

  • variance - The variance in the number of species recorded across all lists

  • p1 - The proportion of visits that had a single species recorded

  • p4 - The proportion of visits that had four or more species recorded

  • n_lists - The number of lists this recorder recorded

Examples

# NOT RUN {
# load example data
head(cit_sci_data)

# Location might be a site name column in your data or a unique combination of lat and long
# Our data is missing a location column so we will use lat and long
# It might be more sensible to convert lat long to a grid reference and 
# use a 1 km square grid reference to represent a site 
cit_sci_data$location <- paste(round(cit_sci_data$lat, 4), round(cit_sci_data$long, 4))

# run for one recorder
LL <- listLength(data = cit_sci_data,
                 recorder_name = 3007,
                 threshold = 10,
                 plot = FALSE,
                 sp_col = 'species',
                 date_col = 'date',
                 recorder_col = 'recorder',
                 location_col = 'location')

# Run the metric for all recorders
LL_all <- lapply(unique(cit_sci_data$recorder),
                 FUN = listLength,
                 data = cit_sci_data,
                 threshold = 10,
                 plot = FALSE,
                 sp_col = 'species',
                 date_col = 'date',
                 recorder_col = 'recorder',
                 location_col = 'location')

# summarise as one table
LL_all_sum <- do.call(rbind, LL_all)

hist(LL_all_sum$n_lists, breaks = 80)
# }