% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/model_gam.R
\name{model_gam}
\alias{model_gam}
\title{Modeling of indicator responses to single pressures with GAMs}
\usage{
model_gam(init_tbl, k = 5, family = stats::gaussian(), excl_outlier = NULL)
}
\arguments{
\item{init_tbl}{The output tibble of the \code{\link{ind_init}} function.}

\item{k}{Choice of knots (for the smoothing function \code{\link{s}}); the default is 5.}

\item{family}{A description of the error distribution and link to be used in the GAM.
This needs to be defined as a family function (see also \code{\link{family}}). All
standard family functions can be used as well some of the distribution families in
the mgcv package (see \code{\link[mgcv]{family.mgcv}}; e.g.\code{\link[mgcv]{negbin}}
or \code{\link[mgcv]{nb}}).}

\item{excl_outlier}{A list of values identified as outliers in specific
IND~pressure GAMs, which should be excluded in this modeling step
(the output tibble of this function includes the variable
`pres_outlier`, which is a column-list containing
all indices of values with cook`s distance > 1 (see below). The function
can be re-run again, then excluding all these outliers provided in
\code{$pres_outlier} from the the first run (see example)).}
}
\value{
The function returns a \code{\link[tibble]{tibble}}, which is a trimmed down version of
the data.frame(), including the following elements:
\describe{
  \item{\code{id}}{Numerical IDs for the IND~press combinations.}
  \item{\code{ind}}{Indicator names.}
  \item{\code{press}}{Pressure names.}
  \item{\code{model_type}}{Specification of the model type; at this stage containing only
             "gam" (Generalized Additive Model).}
  \item{\code{corrstruc}}{Specification of the correlation structure; at this stage
             containing only "none".}
  \item{\code{aic}}{AIC of the fitted models}
  \item{\code{edf}}{Estimated degrees of freedom for the model terms.}
  \item{\code{p_val}}{The p values for the smoothing term (the pressure).}
  \item{\code{signif_code}}{The significance codes for the p-values.}
  \item{\code{r_sq}}{The adjusted r-squared for the models. Defined as the proportion
              of variance explained, where original variance and residual variance are
              both estimated using unbiased estimators. This quantity can be negative
              if your model is worse than a one parameter constant model, and can be
              higher for the smaller of two nested models.}
  \item{\code{expl_dev}}{The proportion of the null deviance explained by the models.}
  \item{\code{nrmse}}{Absolute values of the root mean square error normalized by the
              standard deviation (NRMSE).}
  \item{\code{ks_test}}{The p-values from a Kolmogorov-Smirnov Test applied on the model
              residuals to test for normal distribution. P-values > 0.05 indicate
              normally distributed residuals.}
  \item{\code{tac}}{logical; indicates whether temporal autocorrelation (TAC) was detected
              in the residuals. TRUE if model residuals show TAC. NAs in the time series
              due to real missing values, test data extraction or exclusion of outliers
              are explicitly considered. The test is based on the following condition:
              if any of the acf \strong{and} pacf values of lag 1 - 5 are greater than 0.4
              or lower than -0.4, a TRUE is returned.}
  \item{\code{pres_outlier}}{A list-column with all indices of values identified as outliers
              in each model (i.e.cook`s distance > 1). The indices present the position in
              the training data, including NAs.}
  \item{\code{excl_outlier}}{A list-column listing all outliers per model that have been
              excluded in the GAM fitting}
  \item{\code{model}}{A list-column of IND~press-specific gam objects that contain additionally
             the logical vector indicating missing values (\code{$train_na}).}
}
}
\description{
\code{model_gam} applies Generalized Additive Models (GAMs) to each IND~pressure
combination created in \code{\link{ind_init}} and returns a tibble with
IND~pressure-specific GAM outputs.
}
\details{
To evaluate the IND`s sensitivity and robustness time series of the IND are
modeled as a smoothing function of one single pressure variable (using a subset
of the data as training dataset, e.g. excluding the years of the annual time series).
The GAMs are build using the default settings in the \code{gam} function and
the smooth term function \code{\link[mgcv]{s}}).  However, the user can adjust
the distribution and link by modifying the family argument as well as the
maximum level of non-linearity by setting the number of knots:

\code{gam(ind ~ s(press, k = k), family = family, data = training_data)}

In the presence of significant temporal auto-correlation, GAMs should be extended to
Generalized Additive Mixed Models (GAMMs) by including auto-regressive error structures
to correct for the auto-correlation (Pinheiro and Bates, 2000). This is implemented in
the function \code{\link{model_gamm}}.

The returned tibble contains various model outputs needed for scoring the sensitivity
and robustness subcriteria:
\itemize{
  \item \code{p_val} to identify whether an IND responds to a specific pressure
  \item \code{r_sq} for the strength of the IND response
  \item \code{edf} for the non-linearity of the IND response
  \item \code{nrmse} for the robustness of the established IND~pressure relationship
}

The robustness of the modeled pressure relationship based on the training data
is evaluated by measuring how well the model prediction matches the test dataset,
e.g. the last years. This is quantified by computing the absolute value of the
normalized root mean square error (NRMSE) on the test dataset. The normalization
to the mean of the observed test data allows for comparisons and a general scoring
of the model robustness across INDs with different scales or units.
}
\examples{
# Using the Baltic Sea demo data in this package
dat_init <- ind_init(
  ind_tbl = ind_ex[, c("Sprat", "Cod")],
  press_tbl = press_ex[, c("Tsum", "Swin", "Fcod", "Fher")],
  time = ind_ex[ ,1])
gam_tbl <- model_gam(dat_init)
# Any outlier?
gam_tbl$pres_outlier
# Exclude outliers by passing this list as input:
gam_tbl_out <- model_gam(dat_init, excl_outlier = gam_tbl$pres_outlier)

\donttest{
 # Using another error distribution
 ind_sub <- round(exp(ind_ex[ ,c(2,8,9)]),0) # to unlog data and convert to integers
 ind_tbl2 <- ind_init(ind_sub, press_ex, time = ind_ex$Year)
 model_gam(ind_tbl2, family = poisson(link="log"))
}
}
\references{
Pinheiro, J.C., Bates, D.M. (2000) Mixed-Effects Models in S and S-Plus.
Springer, New York, 548pp.
}
\seealso{
\code{\link[tibble]{tibble}} and the \code{vignette("tibble")} for more
 informations on tibbles,
 \code{\link[mgcv]{gam}} for more information on GAMs, and
 \code{\link{plot_diagnostics}} for assessing the model diagnostics

Other IND~pressure modeling functions: 
\code{\link{find_id}()},
\code{\link{ind_init}()},
\code{\link{model_gamm}()},
\code{\link{plot_diagnostics}()},
\code{\link{plot_model}()},
\code{\link{scoring}()},
\code{\link{select_model}()},
\code{\link{test_interaction}()}
}
\concept{IND~pressure modeling functions}
