% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/bestNormalize.R
\name{bestNormalize}
\alias{bestNormalize}
\alias{predict.bestNormalize}
\alias{predict.bestNormalize}
\alias{print.bestNormalize}
\title{Calculate and perform best normalizing transformation}
\usage{
bestNormalize(x, standardize = TRUE, allow_orderNorm = TRUE,
  allow_lambert = FALSE, out_of_sample = TRUE, cluster = NULL, k = 10,
  warn = TRUE, r = 5)

\method{predict}{bestNormalize}(object, newdata = NULL, inverse = FALSE,
  ...)

\method{print}{bestNormalize}(x, ...)
}
\arguments{
\item{x}{A vector to normalize}

\item{standardize}{If TRUE, the transformed values are also centered and
scaled, such that the transformation attempts a standard normal. This will
not change the normality statistic.}

\item{allow_orderNorm}{set to FALSE if orderNorm should not be applied}

\item{allow_lambert}{Set to TRUE if lambertW should be applied (see details)}

\item{out_of_sample}{if FALSE, estimates quickly in-sample performance}

\item{cluster}{name of cluster set using \code{makeCluster}}

\item{k}{number of folds}

\item{warn}{Should bestNormalize warn when a method doesn't work?}

\item{r}{number of repeats}

\item{object}{an object of class 'bestNormalize'}

\item{newdata}{a vector of data to be (reverse) transformed}

\item{inverse}{if TRUE, performs reverse transformation}

\item{...}{additional arguments}
}
\value{
A list of class \code{bestNormalize} with elements

  \item{x.t}{transformed original data} \item{x}{original data}
  \item{norm_stats}{Pearson's Pearson's P / degrees of freedom}
  \item{method}{out-of-sample or in-sample, number of folds + repeats}
  \item{chosen_transform}{the chosen transformation (of appropriate class)}
  \item{other_transforms}{the other transformations (of appropriate class)}

  The \code{predict} function returns the numeric value of the transformation
  performed on new data, and allows for the inverse transformation as well.
}
\description{
Performs a suite of normalizing transformations, and selects the
  best one on the basis of the Pearson P test statistic for normality. The
  transformation that has the lowest P (calculated on the transformed data)
  is selected. See details for more information.
}
\details{
\code{bestNormalize} estimates the optimal normalizing
  transformation. This transformation can be performed on new data, and
  inverted, via the \code{predict} function.

This function currently estimates the Yeo-Johnson transformation,
  the Box Cox transformation (if the data is positive), the log_10(x+a)
  transformation, the square-root (x+a) transformation, and the arcsinh
  transformation. a is set to max(0, -min(x) + eps) by default.  If
  allow_orderNorm == TRUE and if out_of_sample == FALSE then the ordered
  quantile normalization technique will likely be chosen since it essentially
  forces the data to follow a normal distribution. More information on the
  orderNorm technique can be found in the package vignette, or using
  \code{?orderNorm}.

  Repeated cross-validation is used to estimate the out-of-sample performance
  of each transformation if out_of_sample = TRUE. While this can take some
  time, users can speed it up by creating a cluster via the \code{parallel}
  package's \code{makeCluster} function, and passing the name of this cluster
  to \code{bestNormalize} via the cl argument. For best performance, we
  recommend the number of clusters to be set to the number of repeats r. Care
  should be taken to account for the number of observations per fold; to
  small a number and the estimated normality statistic could be inaccurate,
  or at least suffer from high variability.

  NOTE: Only the Lambert technique of type = "s" (skew) ensures that the
  transformation is consistently 1-1, so it is the only method currently used
  in \code{bestNormalize()}. Use type = "h" or type = 'hh' at risk of not
  having this estimate 1-1 transform. These alternative types are effective
  when the data has exceptionally heavy tails, e.g. the Cauchy distribution.
  Additionally, as of v. 1.2.0, Lambert of type "s" is not used in
  \code{bestNormalize()} since it uses multiple threads on some Linux systems,
  which is not allowed on CRAN checks. Set allow_lambert = TRUE in order to 
  test this transformation as well.
}
\examples{

x <- rgamma(100, 1, 1)

\dontrun{
# With Repeated CV
BN_obj <- bestNormalize(x)
BN_obj
p <- predict(BN_obj)
x2 <- predict(BN_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)
}

# Without Repeated CV
BN_obj <- bestNormalize(x, allow_orderNorm = FALSE, out_of_sample = FALSE)
BN_obj
p <- predict(BN_obj)
x2 <- predict(BN_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)


}
\seealso{
\code{\link[bestNormalize]{boxcox}}, \code{\link{orderNorm}},
  \code{\link{yeojohnson}}
}
