% Generated by roxygen2 (4.0.1): do not edit by hand
\name{reduce.space}
\alias{reduce.space}
\alias{reduce.space.explicit}
\title{Reduce the space of potential species by fitting the mixture model with  all potential species as categories}
\usage{
reduce.space(step1, read.cutoff = 1, EMiter = 500, seed = 1)

reduce.space.explicit(pij.sparse.mat, ordered.species, read.weights, outDir,
  gen.prob.unknown, read.cutoff = 1, EMiter = 500, seed = 1)
}
\arguments{
\item{step1}{list. The output from generative.prob() (or generative.prob.nucl(), that is the first step of the pipeline. Alternatively, it can be a character string containing the path name of the ".RData" file where step1 list was saved.}

\item{read.cutoff}{numeric vector. This is the used to decide which species to retain for the subsequent MCMC exploration. Default value is 1, i.e keep all species that have at least one read assigned to them. If this number is still in the low thousands as opposed to the low hundreds the user may set this to a higher number, such as 10.}

\item{EMiter}{Number of iterations for the EM algorithm. Default value is 500.}

\item{seed}{Optional argument that sets the random seed (default is 1) to make results reproducible.}

\item{pij.sparse.mat}{sparse Matrix of generative probabilities computed by generative.prob() /  generative.prob.nucl().}

\item{ordered.species}{data.frame with potential species ordered by numbers of reads matching them. Computed by generative.prob().}

\item{read.weights}{data.frame mapping each read identifier to a weight. For contigs the weight is the number of reads that were used to assemble it. For unassembled reads the weight is equal to one.}

\item{outDir}{character vector holding the path to the output directory where the results are written.}

\item{gen.prob.unknown}{numeric vector. This is the generative probability for the unknown category. Default value for BLASTx-analysis is 1e-06 while for BLASTn-analysis is 1e-20.}
}
\value{
step2: A list with six elements. The first one (ordered.species) is a data.frame containing all the non-empty species categories, as decided by the all inclusive mixture model, ordered by the number of reads assigned to them. The second one (pij.sparse.mat) is a sparse matrix with the generative probability between each read and each species. read.weights, gen.prob.unknown, outDir are all carried forward from the "step1" object. Finally outputEM which records the species abundances throughout the EM iterations (not used in step3 and step4).
}
\description{
Reduce the space of potential species by fitting the mixture model with  all potential species as categories

Having the generative probabilities from step1 (generative.prob() or generative.prob.nucl()), we could proceed directly with the PT MCMC to explore the state space. Typically the number of total potential species is large. Therefore we reduce the size of the state-space, by decreasing the number of species to the low hundreds. We achieve this by fitting a  Mixture Model with as many categories as  all the potential species. Post fitting, we retain only the species categories that are not empty, that is categories that have at least one read assigned to them.

reduce.space.explicit is the same function as reduce.space but with more involved syntax.
}
\examples{
## See vignette for more details.

\dontrun{
# Either load the object created by previous step
data(step1)  ## example output of step1, i.e generative.prob()
step2 <- reduce.space(step1=step1)

# or alternatively point to the location of the step1.RData object
step2 <- reduce.space(step1="/pathtoFile/step1.RData")
}
}
\keyword{reduce.space}
\keyword{reduce.space.explicit}

