% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ORBoostFilter.R
\name{ORBoostFilter}
\alias{ORBoostFilter}
\alias{ORBoostFilter.default}
\alias{ORBoostFilter.formula}
\title{Outlier Removal Boosting Filter}
\usage{
\method{ORBoostFilter}{formula}(formula, data, ...)

\method{ORBoostFilter}{default}(x, N = 20, d = 11, Naux = max(20, N),
  useDecisionStump = FALSE, classColumn = ncol(x), ...)
}
\arguments{
\item{formula}{A formula describing the classification variable and the attributes to be used.}

\item{data, x}{Data frame containing the tranining dataset to be filtered.}

\item{...}{Optional parameters to be passed to other methods.}

\item{N}{Number of boosting iterations.}

\item{d}{Threshold for removing noisy instances. Authors recommend to set it between 3 and 20. If it is set to \code{NULL},
the optimal threshold is chosen according to the procedure described in Karmaker & Kwek. However, this can be
very time-consuming, and in most cases is little relevant for the final result.}

\item{Naux}{Number of boosting iterations for AdaBoost when computing the optimal threshold 'd'.}

\item{useDecisionStump}{If \code{TRUE}, a decision stump is used as weak classifier.
Otherwise (default), naive-Bayes is applied. Recall decision stumps are not appropriate for multi-class problems.}

\item{classColumn}{Positive integer indicating the column which contains the (factor of) classes.
By default, the last column is considered.}
}
\value{
An object of class \code{filter}, which is a list with seven components:
\itemize{
   \item \code{cleanData} is a data frame containing the filtered dataset.
   \item \code{remIdx} is a vector of integers indicating the indexes for
   removed instances (i.e. their row number with respect to the original data frame).
   \item \code{repIdx} is a vector of integers indicating the indexes for
   repaired/relabelled instances (i.e. their row number with respect to the original data frame).
   \item \code{repLab} is a factor containing the new labels for repaired instances.
   \item \code{parameters} is a list containing the argument values.
   \item \code{call} contains the original call to the filter.
   \item \code{extraInf} is a character that includes additional interesting
   information not covered by previous items.
}
}
\description{
Ensemble-based filter for removing label noise from a dataset as a
preprocessing step of classification. For more information, see 'Details' and
'References' sections.
}
\details{
The full description of \code{ORBoostFilter} method can be looked up in Karmaker & Kwek.
In general terms, a weak classifier is built in each iteration, and misclassified instances have their weight
increased for the next round. Instances are removed when their weight exceeds the
threshold \code{d}, i.e. they have been misclassified in consecutive rounds.
}
\note{
By means of a message, the number of noisy instances
removed in each iteration is displayed in the console.
}
\examples{
# Next example is not run in order to save time
\dontrun{
data(iris)
out <- ORBoostFilter(Species~., data = iris, N = 10)
summary(out)
identical(out$cleanData, iris[setdiff(1:nrow(iris),out$remIdx),])
}
}
\references{
Karmaker A., Kwek S. (2005, November): A boosting approach to remove class label noise.
In \emph{Hybrid Intelligent Systems}, 2005. HIS'05. Fifth International Conference on (pp. 6-pp). IEEE.

Freund Y., Schapire R. E. (1997): A decision-theoretic generalization of on-line learning and
an application to boosting. \emph{Journal of computer and system sciences}, 55(1), 119-139.
}

