% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/rdpClassifier.R
\name{rdpTrain}
\alias{rdpTrain}
\title{Training the RDP classifier}
\usage{
rdpTrain(sequence, taxon, K = 8, cnames = FALSE)
}
\arguments{
\item{sequence}{Character vector of 16S sequences.}

\item{taxon}{Character vector of taxon labels for each sequence.}

\item{K}{Word length (integer).}

\item{cnames}{Logical indicating if column names should be added to the trained model matrix.}
}
\value{
A list with two elements. The first element is \code{Method}, which is the text 
\code{"RDPclassifier"} in this case. The second element is \code{Fitted}, which is a 
matrix with one row for each unique \code{taxon} and one column for 
each possible word of length \code{K}. The value in row i and column j is the probability that
word j is present in taxon i.
}
\description{
Training the RDP presence/absence K-mer method on sequence data.
}
\details{
The training step of the RDP method means looking for K-mers on all sequences,
and computing the probability of each K-mer being present for each unique taxon. This is an
attempt to re-implement the method described by Wang et tal (2007), but without the bootstrapping. 
See that publications for all details.

The word-length \code{K} is by default 8, since this is the value used by Wang et al. Larger values
may lead to memory-problems since the trained model is a matrix with 4^K columns. Adding the K-mers 
as column names will slow down all computations.

The relative taxon sizes are also computed, and returned as an attribute to the model matrix. They may 
be used as empirical priors in the classification step.
}
\examples{
# See examples for rdpClassify.

}
\author{
Kristian Hovde Liland and Lars Snipen.
}
\references{
Wang, Q, Garrity, GM, Tiedje, JM, Cole, JR (2007). Naive Bayesian Classifier for 
Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Applied and Enviromental 
Microbiology, 73: 5261-5267.
}
\seealso{
\code{\link{rdpClassify}}.
}

