% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/AutoScore.R
\name{AutoScore_parsimony}
\alias{AutoScore_parsimony}
\title{AutoScore STEP(ii): Select the best model with parsimony plot (AutoScore Modules 2+3+4)}
\usage{
AutoScore_parsimony(
  train_set,
  validation_set,
  rank,
  max_score = 100,
  n_min = 1,
  n_max = 20,
  cross_validation = FALSE,
  fold = 10,
  categorize = "quantile",
  quantiles = c(0, 0.05, 0.2, 0.8, 0.95, 1),
  max_cluster = 5,
  do_trace = FALSE,
  auc_lim_min = 0.5,
  auc_lim_max = "adaptive"
)
}
\arguments{
\item{train_set}{A processed \code{data.frame} that contains data to be analyzed, for training.}

\item{validation_set}{A processed \code{data.frame} that contains data for validation purpose.}

\item{rank}{the raking result generated from AutoScore STEP(i) \code{\link{AutoScore_rank}}}

\item{max_score}{Maximum total score (Default: 100).}

\item{n_min}{Minimum number of selected variables (Default: 1).}

\item{n_max}{Maximum number of selected variables (Default: 20).}

\item{cross_validation}{If set to \code{TRUE}, cross-validation would be used for generating parsimony plot, which is
suitable for small-size data. Default to \code{FALSE}}

\item{fold}{The number of folds used in cross validation (Default: 10). Available if \code{cross_validation = TRUE}.}

\item{categorize}{Methods for categorize continuous variables. Options include "quantile" or "kmeans" (Default: "quantile").}

\item{quantiles}{Predefined quantiles to convert continuous variables to categorical ones. (Default: c(0, 0.05, 0.2, 0.8, 0.95, 1)) Available if \code{categorize = "quantile"}.}

\item{max_cluster}{The max number of cluster (Default: 5). Available if \code{categorize = "kmeans"}.}

\item{do_trace}{If set to TRUE, all results based on each fold of cross-validation would be printed out and plotted (Default: FALSE). Available if \code{cross_validation = TRUE}.}

\item{auc_lim_min}{Min y_axis limit in the parsimony plot (Default: 0.5).}

\item{auc_lim_max}{Max y_axis limit in the parsimony plot (Default: "adaptive").}
}
\value{
List of AUC value for different number of variables
}
\description{
AutoScore STEP(ii): Select the best model with parsimony plot (AutoScore Modules 2+3+4)
}
\details{
This is the second step of the general AutoScore workflow, to generate the parsimony plot to help select a parsimonious model.
 In this step, it goes through AutoScore Module 2,3 and 4 multiple times and to evaluate the performance under different variable list.
 The generated parsimony plot would give researcher an intuitive figure to choose the best models.
 If data size is small (ie, <5000), an independent validation set may not be a wise choice. Then, we suggest using cross-validation
 to maximize the utility of data. Set \code{cross_validation=TRUE}. Run \code{vignette("Guide_book", package = "AutoScore")} to see the guidebook or vignette.
}
\examples{
\donttest{
# see AutoScore Guidebook for the whole 5-step workflow
data("sample_data")
names(sample_data)[names(sample_data) == "Mortality_inpatient"] <- "label"
out_split <- split_data(data = sample_data, ratio = c(0.7, 0.1, 0.2))
train_set <- out_split$train_set
validation_set <- out_split$validation_set
ranking <- AutoScore_rank(train_set, ntree=100)
AUC <- AutoScore_parsimony(
train_set,
validation_set,
rank = ranking,
max_score = 100,
n_min = 1,
n_max = 20,
categorize = "quantile",
quantiles = c(0, 0.05, 0.2, 0.8, 0.95, 1)
)}
}
\references{
\itemize{
 \item{Xie F, Chakraborty B, Ong MEH, Goldstein BA, Liu N, AutoScore: A Machine Learning-Based Automatic Clinical
  Score Generator and Its Application to Mortality Prediction Using Electronic Health Records,
  JMIR Med Inform 2020;8(10):e21798, doi: 10.2196/21798}
}
}
\seealso{
\code{\link{AutoScore_rank}}, \code{\link{AutoScore_weighting}}, \code{\link{AutoScore_fine_tuning}}, \code{\link{AutoScore_testing}}, Run \code{vignette("Guide_book", package = "AutoScore")} to see the guidebook or vignette.
}
