This function applies generalized additive models (GAM) to assess emerging status for a certain time window.
Usage
apply_gam(
df,
y_var,
eval_years,
year = "year",
taxonKey = "taxonKey",
type_indicator = "observations",
baseline_var = NULL,
p_max = 0.1,
taxon_key = NULL,
name = NULL,
df_title = NULL,
x_label = "year",
y_label = "Observations",
saveplot = FALSE,
dir_name = NULL,
verbose = FALSE
)
Arguments
- df
df. A dataframe containing temporal data.
- y_var
character. Name of column containing variable to model. It has to be passed as string, e.g.
"occurrences"
.- eval_years
numeric. Temporal value(s) when emerging status has to be evaluated.
- year
character. Name of column containing temporal values. It has to be passed as string, e.g.
"time"
. Default:"year"
.- taxonKey
character. Name of column containing taxon IDs. It has to be passed as string, e.g.
"taxon"
. Default:"taxonKey"
.- type_indicator
character. One of
"observations"
,"occupancy"
. Used in title of the output plot. Default:"observations"
.- baseline_var
character. Name of the column containing values to use as additional covariate. Such covariate is introduced in the model to correct research effort bias. Default:
NULL
. IfNULL
internal variablemethod_em = "basic"
, otherwisemethod_em = "correct_baseline"
. Value ofmethod_em
will be part of title of output plot.- p_max
numeric. A value between 0 and 1. Default: 0.1.
- taxon_key
numeric, character. Taxon key the timeseries belongs to. Used exclusively in graph title and filename (if
saveplot = TRUE
). Default:NULL
.- name
character. Species name the timeseries belongs to. Used exclusively in graph title and filename (if
saveplot = TRUE
). Default:NULL
.- df_title
character. Any string you would like to add to graph titles and filenames (if
saveplot = TRUE
). The title is always composed of:"GAM"
+type_indicator
+method_em
+taxon_key
+name
+df_title
separated by underscore ("_"). Default:NULL
.- x_label
character. x-axis label of output plot. Default:
"year"
.- y_label
character. y-axis label of output plot. Default:
"number of observations"
.- saveplot
logical. If
TRUE
the plots are authomatically saved. Default:FALSE
.- dir_name
character. Path of directory where saving plots. If path doesn't exists, directory will be created. Example: "./output/graphs/". If
NULL
, plots are saved in current directory. Default:NULL
.- verbose
logical. If
TRUE
status of processing and possible issues are returned. Default:FALSE
.
Value
list with six slots:
em_summary
: df. A data.frame summarizing the emerging status outputs.em_summary
contains as many rows as the length of input variableeval_year
. So, if you evaluate GAM on three years,em_summary
will contain three rows. It contains the following columns:"taxonKey"
: column containing taxon ID. Column name equal to value of argumenttaxonKey
."year"
: column containing temporal values. Column name equal to value of argumentyear
. Column itself is equal to value of argumenteval_years
. So, if you evaluate GAM on years 2017, 2018 (eval_years = c(2017, 2018)
), you will get these two values in this column.em_status
: numeric. Emerging statuses, an integer between 0 and 3.growth
: numeric. Lower limit of GAM confidence interval for the first derivative, if positive. It represents the lower guaranteed growth.method
: character. GAM method, One of:"correct_baseline"
and"basic"
. See details above in description of argumentuse_baseline
.
model
: gam object. The model as returned bygam()
function.NULL
if GAM cannot be applied.output
: df. Complete data.frame containing more details than the summaryem_summary
. It contains the following columns:all columns in
df
.method
: character. GAM method, One of:"correct_baseline"
and"basic"
. See details above in description of argumentuse_baseline
.fit
: numeric. Fit values.ucl
: numeric. The upper confidence level values.lcl
: numeric. The lower confidence level values.em1
: numeric. The emergency value for the 1st derivative. -1, 0 or +1.em2
: numeric. The emergency value for the 2nd derivative: -1, 0 or +1.em
: numeric. The emergency value: from -4 to +4, based onem1
andem2
. See Details.em_status
: numeric. Emerging statuses, an integer between 0 and 3. See Details.growth
: numeric. Lower limit of GAM confidence interval for the first derivative, if positive. It represents the lower guaranteed growth.
first_derivative
: df. Data.frame with details of first derivatives. It contains the following columns:smooth
: smoooth identifier. Example:s(year)
.derivative
: numeric. Value of first derivative.se
: numeric. Standard error ofderivative
.crit
: numeric. Critical value required such thatderivative + (se * crit)
andderivative - (se * crit)
form the upper and lower bounds of the confidence interval on the first derivative of the estimated smooth at the specific confidence level. In our case the confidence level is hard-coded: 0.8. Thencrit <- qnorm(p = (1-0.8)/2, mean = 0, sd = 1, lower.tail = FALSE)
.lower_ci
: numeric. Lower bound of the confidence interval of the estimated smooth.upper_ci
: numeric. Upper bound of the confidence interval of the estimated smooth.value of argument
year
: column with temporal values.value of argument
baseline_var
: column with the fitted values for the baseline. Ifbaseline_var
isNULL
, this column is not present.
second_derivative
: df. Data.frame with details of second derivatives. Same columns asfirst_derivatives
.plot
: a ggplot2 object. Plot of observations with GAM output and emerging status. If emerging status cannot be assessed only observations are plotted.
Details
The GAM modelling is performed using the mgcvb::gam()
. To use this
function, we pass:
a formula
a family object specifying the distribution
a smoothing parameter estimation method
For more information about all other arguments, see
[mgcv::gam()]
.If no covariate is used (
baseline_var
= NULL), the GAM formula is:n ~ s(year, k = maxk, m = 3, bs = "tp")
. Otherwise the GAM formula has a second term,s(n_covariate)
and so the GAM formula isn ~ s(year, k = maxk, m = 3, bs = "tp") + s(n_covariate)
.Description of the parameters present in the formula above:
k
: dimension of the basis used to represent the smooth term, i.e. the number of knots used for calculating the smoother. We #' setk
tomaxk
, which is the number of decades in the time series. If less than 5 decades are present in the data,maxk
is #' set to 5.bs
indicates the basis to use for the smoothing: we uses the default penalized thin plate regression splines.m
specifies the order of the derivatives in the thin plate spline penalty. We usem = 3
, the default value.We use
[mgcv::nb()]
, a negative binomial family to perform the GAM.The smoothing parameter estimation method is set to REML (Restricted maximum likelihood approach). If the P-value of the GAM smoother(s) is/are above threshold value
p_max
, GAM is not performed and the next warning is returned: "GAM output cannot be used: p-values of all GAM smoothers are above {p_max}" wherep_max
is the P-value used as threshold as defined by argumentp_max
.If the
mgcv::gam()
returns an error or a warning, the following message is returned to the user: "GAM ({method_em}) cannot be performed or cannot converge.", wheremethod_em
is one of"basic"
or"correct_baseline"
. See argumentbaseline_var
.The first and second derivatives of the smoother is calculated using function
gratia::derivatives()
with the following hard coded arguments:type
: the type of finite difference used. Set to"central"
.order
: 1 for the first derivative, 2 for the second derivativelevel
: the confidence level. Set to 0.8eps
: the finite difference. Set to 1e-4.For more details, please check derivatives.
The sign of the lower and upper confidence levels of the first and second derivatives are used to define a detailed emergency status (
em
) which is internally used to return the emergency status,em_status
, which is a column of the returned data.frameem_summary
.| ucl-1 | lcl-1 | ucl-2 | lcl-2 | em | em_status | | --- | --- | --- | --- | --- | --- | | + | + | + | + | 4 | 3 (emerging) | | + | + | + | - | 3 | 3 (emerging) | | + | + | - | - | 2 | 2 (potentially emerging) | | - | + | + |
| 1 | 2 (potentially emerging) | | + | - | + | - | 0 | 1 (unclear) | | + | - | - | - | -1 | 0 (not emerging) | | - | - | + | + | -2 | 0 (not emerging) | |
| - | + | - | -3 | 0 (not emerging) | | - | - | - | - | -4 | 0 (not emerging) |