% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/datasim.R
\name{simulate_poisson_gene_data}
\alias{simulate_poisson_gene_data}
\alias{simulate_multinom_gene_data}
\title{Simulate Gene Expression Data from Poisson NMF or Multinomial
  Topic Model}
\usage{
simulate_poisson_gene_data(n, m, k, s, p = 1, sparse = FALSE)

simulate_multinom_gene_data(n, m, k, sparse = FALSE)
}
\arguments{
\item{n}{Number of rows in the simulated count matrix. Should be at
least 2.}

\item{m}{Number of columns in the simulated count matrix. Should be
at least 2.}

\item{k}{Number of factors, or \dQuote{topics}, used to generate
the data. Should be 2 or more.}

\item{s}{Vector of \dQuote{size factors}; each row of the loadings
matrix \code{L} is scaled by the entries of \code{s} before
generating the counts. This should be a vector of length n
containing only positive values.}

\item{p}{Probability that \code{F[i,j]} is equal to the mean rate.
Smaller values of \code{p} will result in more factors that are the
same across topics.}

\item{sparse}{If \code{sparse = TRUE}, convert the counts matrix to
a sparse matrix in compressed, column-oriented format; see
\code{\link[Matrix]{sparseMatrix}}.}
}
\value{
\code{simulate_poisson_gene_data} returns a list containing
  the counts matrix \code{X}, and the size factors \code{s} and
  factorization, \code{F}, \code{L}, used to generate the counts.
  \code{simulate_multinom_gene_data} returns a list containing the
  counts matrix \code{X}, and the mixture proportions \code{L} and
  factors (gene probabilities, or relative gene expression levels)
  \code{F} used to generate the counts.
}
\description{
Simulate count data from a Poisson NMF model or
  multinomial topic model, in which topics represent \dQuote{gene
  expression programs}, and gene expression programs are
  characterized by different rates of expression. The way in which
  the counts are simulated is modeled after gene expression studies
  in which expression is measured by single-cell RNA sequencing
  (\dQuote{RNA-seq}) techniques: each row of the counts matrix
  corresponds a gene expression profile, each column corresponds to a
  gene, and each matrix element is a \dQuote{read count}, or
  \dQuote{UMI count}, measuring expression level. Factors are
  simulated so as to capture realistic changes in gene expression
  across different cell types. See \dQuote{Details} for the procedure
  used to simulate factors, loadings and counts.
}
\details{
Here we describe the process for generating the n x k
  loadings matrix \code{L} and the m x k factors matrix \code{F}.

  Each row of the \code{L} matrix is generated in the following
  manner: (1) the number of nonzero mixture proportions is \eqn{1
  \le n \le k}, with probability proportional to \eqn{2^{-n}};
  (2) the indices of the nonzero mixture proportions are sampled
  uniformly at random; and (3) the nonzero mixture proportions are
  sampled from the Dirichlet distribution with \eqn{\alpha = 1} (so
  that all topics are equally likely).

  Each row of the factors matrix are generated according to the
  following procedure: (1) generate \eqn{u = |r| - 5}, where \eqn{r ~
  N(0,2)}; (2) for each topic \eqn{k}, generate the Poisson rates as
  \eqn{exp(max(t,-5))}, where \eqn{t ~ 0.95 * N(u,s/10) + 0.05 *
  N(u,s)}, and \eqn{s = exp(-u/8)}. Factors can be interpreted as
  Poisson rates or multinomial probabilities, so that individual
  counts can be viewed as being generated from a weighted mixture
  of \dQuote{topics} with different rates or probabilities.

  Once the loadings and factors have been generated, the counts are
  simulated from either the Poisson NMF or multinomial topic model:
  for the former, \code{X[i,j]} is Poisson with rate \code{Y[i,j]},
  where \code{Y = tcrossprod(L,F)}; for the latter, \code{X[i,]} is
  multinomial with size \code{s[i]} and with class probabilities
  \code{P[i,]}, where \code{P = tcrossprod(L,F)}. For the multinomial
  model only, the sizes \code{s} are randomly generated as \code{s =
  10^rnorm(n,3,0.2)}.

  Note that only minimal argument checking is performed;
  the function is mainly used to test implementation of the
  topic-model-based differential count analysis.
}
