% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/rowFiltering.R
\name{remove_sd_outlier}
\alias{remove_sd_outlier}
\title{Standard deviation outlier filtering}
\usage{
remove_sd_outlier(dataSet, cols = "auto", n_sigmas = 3, verbose = TRUE)
}
\arguments{
\item{dataSet}{Matrix, data.frame or data.table}

\item{cols}{List of numeric column(s) name(s) of dataSet to transform. To transform all 
numeric columns, set it to "auto".  (character, default to "auto")}

\item{n_sigmas}{number of times standard deviation is accepted (interger, default to 3)}

\item{verbose}{Should the algorithm talk? (logical, default to TRUE)}
}
\value{
Same dataset with less rows, edited by \strong{reference}. \cr
If you don't want to edit by reference please provide set \code{dataSet = copy(dataSet)}.
}
\description{
Remove outliers based on standard deviation thresholds. \cr
Only values within \code{mean - sd * n_sigmas} and \code{mean + sd * n_sigmas} are kept.
}
\details{
Filtering is made column by column, meaning that extrem values from first element
of \code{cols} are removed, then extrem values from second element of \code{cols} are removed, 
... \cr
So if filtering is perfomed on too many column, there ia high risk that a lot of rows will be dropped.
}
\examples{
# Given
library(data.table)
col_vals <- runif(1000)
col_mean <- mean(col_vals)
col_sd <- sd(col_vals)
extrem_val <- col_mean + 6 * col_sd
dataSet <- data.table(num_col = c(col_vals, extrem_val))

# When
dataSet <- remove_sd_outlier(dataSet, cols = "auto", n_sigmas = 3, verbose = TRUE)

# Then extrem value is no longer in set
extrem_val \%in\% dataSet[["num_col"]] # Is false
}
