% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ml_feature_r_formula.R
\name{ft_r_formula}
\alias{ft_r_formula}
\title{Feature Transformation -- RFormula (Estimator)}
\usage{
ft_r_formula(
  x,
  formula = NULL,
  features_col = "features",
  label_col = "label",
  force_index_label = FALSE,
  uid = random_string("r_formula_"),
  ...
)
}
\arguments{
\item{x}{A \code{spark_connection}, \code{ml_pipeline}, or a \code{tbl_spark}.}

\item{formula}{R formula as a character string or a formula. Formula objects are
converted to character strings directly and the environment is not captured.}

\item{features_col}{Features column name, as a length-one character vector. The column should be single vector column of numeric values. Usually this column is output by \code{\link{ft_r_formula}}.}

\item{label_col}{Label column name. The column should be a numeric column. Usually this column is output by \code{\link{ft_r_formula}}.}

\item{force_index_label}{(Spark 2.1.0+) Force to index label whether it is numeric or
string type. Usually we index label only when it is string type. If
the formula was used by classification algorithms, we can force to index
label even it is numeric type by setting this param with true.
Default: \code{FALSE}.}

\item{uid}{A character string used to uniquely identify the feature transformer.}

\item{...}{Optional arguments; currently unused.}
}
\value{
The object returned depends on the class of \code{x}.

\itemize{
  \item \code{spark_connection}: When \code{x} is a \code{spark_connection}, the function returns a \code{ml_transformer},
  a \code{ml_estimator}, or one of their subclasses. The object contains a pointer to
  a Spark \code{Transformer} or \code{Estimator} object and can be used to compose
  \code{Pipeline} objects.

  \item \code{ml_pipeline}: When \code{x} is a \code{ml_pipeline}, the function returns a \code{ml_pipeline} with
  the transformer or estimator appended to the pipeline.

  \item \code{tbl_spark}: When \code{x} is a \code{tbl_spark}, a transformer is constructed then
  immediately applied to the input \code{tbl_spark}, returning a \code{tbl_spark}
}
}
\description{
Implements the transforms required for fitting a dataset against an R model
  formula. Currently we support a limited subset of the R operators,
  including \code{~}, \code{.}, \code{:}, \code{+}, and \code{-}. Also see the R formula docs here:
  \url{http://stat.ethz.ch/R-manual/R-patched/library/stats/html/formula.html}
}
\details{
The basic operators in the formula are:

  \itemize{
    \item ~ separate target and terms
    \item + concat terms, "+ 0" means removing intercept
    \item - remove a term, "- 1" means removing intercept
    \item : interaction (multiplication for numeric values, or binarized categorical values)
    \item . all columns except target
  }

  Suppose a and b are double columns, we use the following simple examples to illustrate the
  effect of RFormula:

  \itemize{
    \item \code{y ~ a + b} means model \code{y ~ w0 + w1 * a + w2 * b}
      where \code{w0} is the intercept and \code{w1, w2} are coefficients.
    \item \code{y ~ a + b + a:b - 1} means model \code{y ~ w1 * a + w2 * b + w3 * a * b}
      where \code{w1, w2, w3} are coefficients.
  }

 RFormula produces a vector column of features and a double or string column
 of label. Like when formulas are used in R for linear regression, string
 input columns will be one-hot encoded, and numeric columns will be cast to
 doubles. If the label column is of type string, it will be first transformed
 to double with StringIndexer. If the label column does not exist in the
 DataFrame, the output label column will be created from the specified
 response variable in the formula.

In the case where \code{x} is a \code{tbl_spark}, the estimator fits against \code{x}
  to obtain a transformer, which is then immediately used to transform \code{x}, returning a \code{tbl_spark}.
}
\seealso{
See \url{https://spark.apache.org/docs/latest/ml-features.html} for
  more information on the set of transformations available for DataFrame
  columns in Spark.

Other feature transformers: 
\code{\link{ft_binarizer}()},
\code{\link{ft_bucketizer}()},
\code{\link{ft_chisq_selector}()},
\code{\link{ft_count_vectorizer}()},
\code{\link{ft_dct}()},
\code{\link{ft_elementwise_product}()},
\code{\link{ft_feature_hasher}()},
\code{\link{ft_hashing_tf}()},
\code{\link{ft_idf}()},
\code{\link{ft_imputer}()},
\code{\link{ft_index_to_string}()},
\code{\link{ft_interaction}()},
\code{\link{ft_lsh}},
\code{\link{ft_max_abs_scaler}()},
\code{\link{ft_min_max_scaler}()},
\code{\link{ft_ngram}()},
\code{\link{ft_normalizer}()},
\code{\link{ft_one_hot_encoder_estimator}()},
\code{\link{ft_one_hot_encoder}()},
\code{\link{ft_pca}()},
\code{\link{ft_polynomial_expansion}()},
\code{\link{ft_quantile_discretizer}()},
\code{\link{ft_regex_tokenizer}()},
\code{\link{ft_robust_scaler}()},
\code{\link{ft_sql_transformer}()},
\code{\link{ft_standard_scaler}()},
\code{\link{ft_stop_words_remover}()},
\code{\link{ft_string_indexer}()},
\code{\link{ft_tokenizer}()},
\code{\link{ft_vector_assembler}()},
\code{\link{ft_vector_indexer}()},
\code{\link{ft_vector_slicer}()},
\code{\link{ft_word2vec}()}
}
\concept{feature transformers}
