% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/missRanger.R
\name{missRanger}
\alias{missRanger}
\title{Fast Imputation of Missing Values by Chained Tree Ensembles}
\usage{
missRanger(data, maxiter = 10L, pmm.k = 0L, seed = NULL, verbose = 1,
  ...)
}
\arguments{
\item{data}{A \code{data.frame} with missing values to impute.}

\item{maxiter}{Maximum number of chaining iterations.}

\item{pmm.k}{Number of candidate non-missing values to sample from in the predictive mean matching step. 0 to avoid this step.}

\item{seed}{Integer seed to initialize the random generator.}

\item{verbose}{Controls how much info is printed to screen. 0 to print nothing. 1 (default) to print a "." per iteration and 
variable, 2 to print the OOB prediction error per iteration and variable (1 minus R-squared for regression).}

\item{...}{Arguments passed to \code{ranger}. If the data set is large, better use less trees 
(e.g. \code{num.trees = 100}) and/or a low value of \code{sample.fraction}. 
The following arguments are incompatible: \code{formula}, \code{data}, \code{write.forest}, 
\code{probability}, \code{split.select.weights}, \code{dependent.variable.name}, and \code{classification}.}
}
\value{
An imputed \code{data.frame}.
}
\description{
Uses the "ranger" package [1] to do fast missing value imputation by chained tree ensembles, see [2] and [3].
Between the iterative model fitting, it offers the option of predictive mean matching. This firstly avoids imputation
with values not present in the original data (like a value 0.3334 in a 0-1 coded variable). Secondly, predictive mean
matching tries to raise the variance in the resulting conditional distributions to a realistic level and, as such, 
allows to do multiple imputation when repeating the call to missRanger(). The iterative chaining stops as soon as \code{maxiter}
is reached or if the average out-of-bag estimate of performance stops improving. In the latter case, except for the first iteration,
the second last (i.e. best) imputed data is returned.
}
\examples{
irisWithNA <- generateNA(iris)
irisImputed <- missRanger(irisWithNA, pmm.k = 3, num.trees = 100)
head(irisImputed)
head(irisWithNA)

# With extra trees algorithm
irisImputed_et <- missRanger(irisWithNA, pmm.k = 3, num.trees = 100, splitrule = "extratrees")
head(irisImputed_et)
}
\references{
[1] Wright, M. N. & Ziegler, A. (2016). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, in press. http://arxiv.org/abs/1508.04409.

[2] Stekhoven, D.J. and Buehlmann, P. (2012). 'MissForest - nonparametric missing value imputation for mixed-type data', Bioinformatics, 28(1) 2012, 112-118, doi: 10.1093/bioinformatics/btr597

[3] Van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. http://www.jstatsoft.org/v45/i03/
}
