% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/sensitivity_analysis_qual.R
\name{sensitivity_analysis_qual}
\alias{sensitivity_analysis_qual}
\title{Perform sensitivity analysis on ecometric models (qualitative environmental variables)}
\usage{
sensitivity_analysis_qual(
  points_df,
  category_col,
  sample_sizes,
  iterations = 20,
  test_split = 0.2,
  grid_bins_1 = NULL,
  grid_bins_2 = NULL,
  parallel = TRUE,
  n_cores = parallel::detectCores() - 1
)
}
\arguments{
\item{points_df}{Output first element of the list from \code{summarize_traits_by_point()}. A data frame with columns: \code{summ_trait_1}, \code{summ_trait_2}, \code{count_trait}, and the environmental variable specified in \code{category_col}.}

\item{category_col}{Name of the column containing the categorical trait.}

\item{sample_sizes}{Numeric vector specifying the number of communities (sampling points)
to evaluate in the sensitivity analysis. For each value, a random subset of the data of that
size is drawn without replacement and then split into training and testing sets using the
proportion defined by \code{test_split} (default is 80\% training, 20\% testing).
All values in \code{sample_sizes} must be less than or equal to the number of rows in \code{points_df},
and large enough to allow splitting based on \code{test_split} (i.e., both the training and testing
sets must contain at 30 communities).}

\item{iterations}{Number of bootstrap iterations per sample size (default = 20).}

\item{test_split}{Proportion of data to use for testing (default = 0.2).}

\item{grid_bins_1}{Number of bins for the first trait axis. If \code{NULL} (default),
the number is calculated automatically using Scott's rule via \code{optimal_bins()}.}

\item{grid_bins_2}{Number of bins for the second trait axis. If \code{NULL} (default),
the number is calculated automatically using Scott's rule via \code{optimal_bins()}.}

\item{parallel}{Logical; whether to run iterations in parallel (default = TRUE).}

\item{n_cores}{Number of cores for parallelization (default = detectCores() - 1).}
}
\value{
A list containing:
\item{combined_results}{Raw iteration results as a data frame. Each row corresponds to one bootstrap iteration.}
\item{summary_results}{Mean metrics across bootstrap iterations for each sample size.}
}
\description{
Evaluates how varying sample sizes affect the performance of ecometric models,
focusing on two aspects:
\itemize{
\item \strong{Sensitivity (internal consistency)}: How accurately the model predicts environmental conditions
on the same data on which it was trained.
\item \strong{Transferability (external applicability)}: How well the model performs on unseen data.
}
It tests different sample sizes by resampling the data multiple times (bootstrap iterations),
training an ecometric model on each subset, and evaluating prediction error and correlation.
}
\details{
Two plots are generated:
\enumerate{
\item \strong{Training Accuracy vs. Sample size:} Reflects internal model consistency.
\item \strong{Testing Accuracy vs. Sample size:} Reflects external model performance.
}

Parallel processing is supported to speed up the analysis.
}
\examples{
\donttest{
# Load internal data
data("geoPoints", package = "commecometrics")
data("traits", package = "commecometrics")
data("spRanges", package = "commecometrics")

# Summarize trait values at sampling points
traitsByPoint <- summarize_traits_by_point(
  points_df = geoPoints,
  trait_df = traits,
  species_polygons = spRanges,
  trait_column = "RBL",
  species_name_col = "sci_name",
  continent = FALSE,
  parallel = FALSE
)

# Run sensitivity analysis for dominant land cover class
sensitivityQual <- sensitivity_analysis_qual(
  points_df = traitsByPoint$points,
  category_col = "vegetation",
  sample_sizes = seq(40, 90, 10),
  iterations = 5,
  parallel = FALSE
)

# View results
head(sensitivityQual$summary_results)
}
}
