% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/gp.R
\name{gp}
\alias{gp}
\title{Gaussian process emulator construction}
\usage{
gp(
  X,
  Y,
  name = "sexp",
  lengthscale = rep(0.1, ncol(X)),
  bounds = NULL,
  prior = "ref",
  nugget_est = FALSE,
  nugget = ifelse(nugget_est, 0.01, 1e-08),
  scale_est = TRUE,
  scale = 1,
  training = TRUE,
  verb = TRUE,
  check_rep = TRUE,
  vecchia = FALSE,
  M = 25,
  ord = NULL,
  id = NULL
)
}
\arguments{
\item{X}{a matrix where each row is an input data point and each column is an input dimension.}

\item{Y}{a matrix with only one column and each row being an output data point.}

\item{name}{kernel function to be used. Either \code{"sexp"} for squared exponential kernel or
\code{"matern2.5"} for Matérn-2.5 kernel. Defaults to \code{"sexp"}.}

\item{lengthscale}{initial values of lengthscales in the kernel function. It can be a single numeric value or a vector of length \code{ncol(X)}:
\itemize{
\item if it is a single numeric value, it is assumed that kernel functions across input dimensions share the same lengthscale;
\item if it is a vector, it is assumed that kernel functions across input dimensions have different lengthscales.
}

Defaults to a vector of \code{0.1}.}

\item{bounds}{the lower and upper bounds of lengthscales in the kernel function. It is a vector of length two where the first element is
the lower bound and the second element is the upper bound. The bounds will be applied to all lengthscales in the kernel function. Defaults
to \code{NULL} where no bounds are specified for the lengthscales.}

\item{prior}{prior to be used for Maximum a Posterior for lengthscales and nugget of the GP: gamma prior (\code{"ga"}), inverse gamma prior (\code{"inv_ga"}),
or jointly robust prior (\code{"ref"}). Defaults to \code{"ref"}. See the reference below for the jointly
robust prior.}

\item{nugget_est}{a bool indicating if the nugget term is to be estimated:
\enumerate{
\item \code{FALSE}: the nugget term is fixed to \code{nugget}.
\item \code{TRUE}: the nugget term will be estimated.
}

Defaults to \code{FALSE}.}

\item{nugget}{the initial nugget value. If \code{nugget_est = FALSE}, the assigned value is fixed during the training.
Set \code{nugget} to a small value (e.g., \code{1e-8}) and the corresponding bool in \code{nugget_est} to \code{FALSE} for deterministic computer models where the emulator
should interpolate the training data points. Set \code{nugget} to a larger value and the corresponding bool in \code{nugget_est} to \code{TRUE} for stochastic
emulation where the computer model outputs are assumed to follow a homogeneous Gaussian distribution. Defaults to \code{1e-8} if \code{nugget_est = FALSE} and
\code{0.01} if \code{nugget_est = TRUE}.}

\item{scale_est}{a bool indicating if the variance is to be estimated:
\enumerate{
\item \code{FALSE}: the variance is fixed to \code{scale}.
\item \code{TRUE}: the variance term will be estimated.
}

Defaults to \code{TRUE}.}

\item{scale}{the initial variance value. If \code{scale_est = FALSE}, the assigned value is fixed during the training.
Defaults to \code{1}.}

\item{training}{a bool indicating if the initialized GP emulator will be trained.
When set to \code{FALSE}, \code{\link[=gp]{gp()}} returns an untrained GP emulator, to which one can apply \code{\link[=summary]{summary()}} to inspect its specification or apply \code{\link[=predict]{predict()}} to check its emulation performance before the training. Defaults to \code{TRUE}.}

\item{verb}{a bool indicating if the trace information on GP emulator construction and training will be printed during function execution.
Defaults to \code{TRUE}.}

\item{check_rep}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} a bool indicating whether to check for repetitions in the dataset, i.e., if one input
position has multiple outputs. Defaults to \code{TRUE}.}

\item{vecchia}{a bool indicating whether to use Vecchia approximation for large-scale GP emulator construction and prediction. Defaults to \code{FALSE}.
The Vecchia approximation implemented for the GP emulation largely follows Katzfuss et al. (2022). See reference below.}

\item{M}{the size of the conditioning set for the Vecchia approximation in the GP emulator training. Defaults to \code{25}.}

\item{ord}{an R function that returns the ordering of the input to the GP emulator for the Vecchia approximation. The function must satisfy the following basic rules:
\itemize{
\item the first argument represents the input scaled by the lengthscales.
\item the output of the function is a vector of indices that gives the ordering of the input to the GP emulator.
}

If \code{ord = NULL}, the default random ordering is used. Defaults to \code{NULL}.}

\item{id}{an ID to be assigned to the GP emulator. If an ID is not provided (i.e., \code{id = NULL}), a UUID (Universally Unique Identifier) will be automatically generated
and assigned to the emulator. Default to \code{NULL}.}
}
\value{
An S3 class named \code{gp} that contains five slots:
\itemize{
\item \code{id}: A number or character string assigned through the \code{id} argument.
\item \code{data}: a list that contains two elements: \code{X} and \code{Y} which are the training input and output data respectively.
\item \code{specs}: a list that contains seven elements:
\enumerate{
\item \code{kernel}: the type of the kernel function used. Either \code{"sexp"} for squared exponential kernel or \code{"matern2.5"} for Matérn-2.5 kernel.
\item \code{lengthscales}: a vector of lengthscales in the kernel function.
\item \code{scale}: the variance value in the kernel function.
\item \code{nugget}: the nugget value in the kernel function.
\item \code{vecchia}: whether the Vecchia approximation is used for the GP emulator training.
\item \code{M}: the size of the conditioning set for the Vecchia approximation in the GP emulator training.
}
\item \code{constructor_obj}: a 'python' object that stores the information of the constructed GP emulator.
\item \code{container_obj}: a 'python' object that stores the information for the linked emulation.
\item \code{emulator_obj}: a 'python' object that stores the information for the predictions from the GP emulator.
}

The returned \code{gp} object can be used by
\itemize{
\item \code{\link[=predict]{predict()}} for GP predictions.
\item \code{\link[=validate]{validate()}} for LOO and OOS validations.
\item \code{\link[=plot]{plot()}} for validation plots.
\item \code{\link[=lgp]{lgp()}} for linked (D)GP emulator constructions.
\item \code{\link[=summary]{summary()}} to summarize the trained GP emulator.
\item \code{\link[=write]{write()}} to save the GP emulator to a \code{.pkl} file.
\item \code{\link[=design]{design()}} for sequential designs.
\item \code{\link[=update]{update()}} to update the GP emulator with new inputs and outputs.
\item \code{\link[=alm]{alm()}}, \code{\link[=mice]{mice()}}, and \code{\link[=vigf]{vigf()}} to locate next design points.
}
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#updated}{\figure{lifecycle-updated.svg}{options: alt='[Updated]'}}}{\strong{[Updated]}}

This function builds and trains a GP emulator.
}
\details{
See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
}
\note{
Any R vector detected in \code{X} and \code{Y} will be treated as a column vector and automatically converted into a single-column
R matrix. Thus, if \code{X} is a single data point with multiple dimensions, it must be given as a matrix.
}
\examples{
\dontrun{
# load the package and the Python env
library(dgpsi)

# construct a step function
f <- function(x) {
   if (x < 0.5) return(-1)
   if (x >= 0.5) return(1)
  }

# generate training data
X <- seq(0, 1, length = 10)
Y <- sapply(X, f)

# training
m <- gp(X, Y)

# summarizing
summary(m)

# LOO cross validation
m <- validate(m)
plot(m)

# prediction
test_x <- seq(0, 1, length = 200)
m <- predict(m, x = test_x)

# OOS validation
validate_x <- sample(test_x, 10)
validate_y <- sapply(validate_x, f)
plot(m, validate_x, validate_y)

# write and read the constructed emulator
write(m, 'step_gp')
m <- read('step_gp')
}

}
\references{
\itemize{
\item Gu, M. (2019). Jointly robust prior for Gaussian stochastic process in emulation, calibration and variable selection. \emph{Bayesian Analysis}, \strong{14(3)}, 857-885.
\item Katzfuss, M., Guinness, J., & Lawrence, E. (2022). Scaled Vecchia approximation for fast computer-model emulation. \emph{SIAM/ASA Journal on Uncertainty Quantification}, \strong{10(2)}, 537-554.
}
}
