\name{rlars}
\alias{print.rlars}
\alias{rlars}
\alias{rlars.default}
\alias{rlars.formula}
\title{Robust least angle regression}
\usage{
  rlars(x, ...)

  \method{rlars}{formula} (formula, data, ...)

  \method{rlars}{default} (x, y, sMax = NA,
    centerFun = median, scaleFun = mad, const = 2,
    prob = 0.95, fit = TRUE, regFun = lmrob,
    regArgs = list(), crit = c("BIC", "PE"),
    splits = foldControl(), cost = rtmspe,
    costArgs = list(), selectBest = c("hastie", "min"),
    seFactor = 1, ncores = 1, cl = NULL, seed = NULL,
    model = TRUE, tol = .Machine$double.eps^0.5, ...)
}
\arguments{
  \item{formula}{a formula describing the full model.}

  \item{data}{an optional data frame, list or environment
  (or object coercible to a data frame by
  \code{\link{as.data.frame}}) containing the variables in
  the model.  If not found in data, the variables are taken
  from \code{environment(formula)}, typically the
  environment from which \code{rlars} is called.}

  \item{x}{a matrix or data frame containing the candidate
  predictors.}

  \item{y}{a numeric vector containing the response.}

  \item{sMax}{an integer vector of length two.  If a single
  integer is supplied, it is recycled.  The first element
  gives the number of predictors to be sequenced.  If it is
  \code{NA} (the default), predictors are sequenced as long
  as there are no singularity issues.  The second element
  gives the maximum number of predictors to be included in
  the final model.  If it is \code{NA} (the default),
  predictors may be added to the model as long as there are
  twice as many observations as predictors.}

  \item{centerFun}{a function to compute a robust estimate
  for the center (defaults to
  \code{\link[stats]{median}}).}

  \item{scaleFun}{a function to compute a robust estimate
  for the scale (defaults to \code{\link[stats]{mad}}).}

  \item{const}{numeric; tuning constant to be used in the
  initial corralation estimates based on adjusted
  univariate winsorization (defaults to 2).}

  \item{prob}{numeric; probability for the quantile of the
  \eqn{\chi^{2}}{chi-squared} distribution to be used in
  bivariate winsorization (defaults to 0.95).}

  \item{fit}{a logical indicating whether to fit submodels
  along the sequence (\code{TRUE}, the default) or to
  simply return the sequence (\code{FALSE}).}

  \item{regFun}{a function to compute robust linear
  regressions along the sequence (defaults to
  \code{\link[robustbase]{lmrob}}).}

  \item{regArgs}{a list of arguments to be passed to
  \code{regFun}.}

  \item{crit}{a character string specifying the optimality
  criterion to be used for selecting the final model.
  Possible values are \code{"BIC"} for the Bayes
  information criterion and \code{"PE"} for
  resampling-based prediction error estimation.}

  \item{splits}{an object giving data splits to be used for
  prediction error estimation (see
  \code{\link[perry]{perry}}).}

  \item{cost}{a cost function measuring prediction loss
  (see \code{\link[perry]{perry}} for some requirements).
  The default is to use the root trimmed mean squared
  prediction error (see \code{\link[perry]{cost}}).}

  \item{costArgs}{a list of additional arguments to be
  passed to the prediction loss function \code{cost}.}

  \item{selectBest,seFactor}{arguments specifying a
  criterion for selecting the best model (see
  \code{\link[perry]{perrySelect}}).  The default is to use
  a one-standard-error rule.}

  \item{ncores}{a positive integer giving the number of
  processor cores to be used for parallel computing (the
  default is 1 for no parallelization).  If this is set to
  \code{NA}, all available processor cores are used.  For
  fitting models along the sequence or for prediction error
  estimation, parallel computing is implemented on the \R
  level using package \pkg{parallel}.  Otherwise parallel
  computing for some of of the more computer-intensive
  computations in the sequencing step is implemented on the
  C++ level via OpenMP (\url{http://openmp.org/}).}

  \item{cl}{a \pkg{parallel} cluster for parallel computing
  as generated by \code{\link[parallel]{makeCluster}}.
  This is preferred over \code{ncores} for tasks that are
  parallelized on the \R level, in which case \code{ncores}
  is only used for tasks that are parallelized on the C++
  level.}

  \item{seed}{optional initial seed for the random number
  generator (see \code{\link{.Random.seed}}).  This is
  useful because many robust regression functions
  (including \code{\link[robustbase]{lmrob}}) involve
  randomness, or for prediction error estimation.  On
  parallel \R worker processes for prediction error
  estimation, random number streams are used and the seed
  is set via \code{\link{clusterSetRNGStream}}.}

  \item{model}{a logical indicating whether the model data
  should be included in the returned object.}

  \item{tol}{a small positive numeric value.  This is used
  in bivariate winsorization to determine whether the
  initial estimate from adjusted univariate winsorization
  is close to 1 in absolute value.  In this case, bivariate
  winsorization would fail since the points form almost a
  straight line, and the initial estimate is returned.}

  \item{\dots}{additional arguments to be passed down.  For
  the default method, additional arguments to be passed
  down to \code{\link[=standardize]{robStandardize}}.}
}
\value{
  If \code{fit} is \code{FALSE}, an integer vector
  containing the indices of the sequenced predictors.

  Otherwise an object of class \code{"rlars"} (inheriting
  from class \code{"seqModel"} if \code{crit="BIC"} or
  \code{"optSeqModel"} if \code{crit="PE"}) with the
  following components:

  \item{active}{an integer vector containing the indices of
  the sequenced predictors.}

  \item{df}{an integer vector containing the degrees of
  freedom of the submodels along the sequence (i.e., the
  number of estimated coefficients).}

  \item{coefficients}{a numeric matrix in which each column
  contains the regression coefficients of the corresponding
  submodel along the sequence (\code{"seqModel"}); or a
  numeric vector containing the coefficients of the optimal
  submodel (\code{"optSeqModel"}).}

  \item{fitted.values}{a numeric matrix in which each
  column contains the fitted values of the corresponding
  submodel along the sequence (\code{"seqModel"}); or a
  numeric vector containing the fitted values of the
  optimal submodel (\code{"optSeqModel"}).}

  \item{residuals}{a numeric matrix in which each column
  contains the residuals of the corresponding submodel
  along the sequence (\code{"seqModel"}); or a numeric
  vector containing the residuals of the optimal submodel
  (\code{"optSeqModel"}).}

  \item{crit}{a character string specifying the optimality
  criterion used for selecting the final model.}

  \item{critValues}{a numeric vector containing the values
  of the optimality criterion from the submodels along the
  sequence (\code{"seqModel"}); or an object of class
  \code{"perrySeqModel"} (inheriting from
  \code{"\link[perry]{perrySelect}"}) that contains the
  estimated prediction errors of the submodels
  (\code{"optSeqModel"}).}

  \item{sOpt}{an integer giving the optimal submodel (only
  \code{"seqModel"}).}

  \item{muY}{numeric; the center estimate of the response.}

  \item{sigmaY}{numeric; the scale estimate of the
  response.}

  \item{muX}{a numeric vector containing the center
  estimates of the predictors.}

  \item{sigmaX}{a numeric vector containing the scale
  estimates of the predictors.}

  \item{x}{the matrix of candidate predictors (if
  \code{model} is \code{TRUE}).}

  \item{y}{the response (if \code{model} is \code{TRUE}).}

  \item{call}{the matched function call.}
}
\description{
  Robustly sequence candidate predictors according to their
  predictive content and find the optimal model along the
  sequence.
}
\examples{
## generate data
# example is not high-dimensional to keep computation time low
set.seed(1234)  # for reproducibility
n <- 100  # number of observations
p <- 25   # number of variables
beta <- rep.int(c(1, 0), c(5, p-5))  # coefficients
sigma <- 0.5      # controls signal-to-noise ratio
epsilon <- 0.1    # contamination level
x <- replicate(p, rnorm(n))     # predictor matrix
e <- rnorm(n)                   # error terms
i <- 1:ceiling(epsilon*n)       # observations to be contaminated
e[i] <- e[i] + 5                # vertical outliers
y <- c(x \%*\% beta + sigma * e)  # response
x[i,] <- x[i,] + 5              # bad leverage points

## fit robust LARS model
rlars(x, y)
}
\author{
  Andreas Alfons, based on code by Jafar A. Khan, Stefan
  Van Aelst and Ruben H. Zamar
}
\references{
  Khan, J.A., Van Aelst, S. and Zamar, R.H. (2007) Robust
  linear model selection based on least angle regression.
  \emph{Journal of the American Statistical Association},
  \bold{102}(480), 1289--1299.
}
\seealso{
  \code{\link{coef.seqModel}},
  \code{\link{fitted.seqModel}},
  \code{\link{residuals.seqModel}},
  \code{\link{predict.seqModel}},
  \code{\link{plot.seqModel}}
}
\keyword{regression}
\keyword{robust}

