% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/betaclust.R
\name{betaclust}
\alias{betaclust}
\title{The betaclust wrapper function}
\usage{
betaclust(
  data,
  M = 3,
  N,
  R,
  model_names = "K..",
  model_selection = "BIC",
  parallel_process = FALSE,
  seed = NULL
)
}
\arguments{
\item{data}{A dataframe of dimension \eqn{C \times NR} containing methylation values for \eqn{C} CpG sites from \eqn{R} samples collected from \eqn{N} patients.
Samples are grouped together in the dataframe such that the columns are ordered as Sample1_Patient1, Sample1_Patient2, Sample2_Patient1, Sample2_Patient2, etc.}

\item{M}{Number of methylation states to be identified in a DNA sample.}

\item{N}{Number of patients in the study.}

\item{R}{Number of samples collected from each patient for the study.}

\item{model_names}{Models to run from the set of models, K.., KN. and K.R, default = K.. . See details.}

\item{model_selection}{Information criterion used for model selection. Options are AIC, BIC or ICL (default = BIC).}

\item{parallel_process}{The "TRUE" option results in parallel processing of the models for increased computational efficiency. The default option has been set as "FALSE" due to package testing limitations.}

\item{seed}{Seed to allow for reproducibility (default = NULL).}
}
\value{
The function returns an object of the \code{\link[betaclust:betaclust]{betaclust}} class which contains the following values:
\itemize{
\item information_criterion - The information criterion used to select the optimal model.
\item ic_output - The information criterion value calculated for each model.
\item optimal_model - The model selected as optimal.
\item function_call - The parameters passed as arguments to the function \code{\link[betaclust:betaclust]{betaclust}}.
\item K - The number of clusters identified using the beta mixture models.
\item C - The number of CpG sites analysed using the beta mixture models.
\item N - The number of patients analysed using the beta mixture models.
\item R - The number of samples analysed using the beta mixture models.
\item optimal_model_results - Information from the optimal model. Specifically,
   \itemize{
   \item cluster_size - The total number of CpG sites in each of the K clusters.
   \item llk - A vector containing the log-likelihood value at each step of the EM algorithm.
   \item alpha - This contains the first shape parameter for the beta mixture model.
   \item delta - This contains the second shape parameter for the beta mixture model.
   \item tau - The proportion of CpG sites in each cluster.
   \item z - A matrix of dimension \eqn{C \times K} containing the posterior probability of each CpG site belonging to each of the \eqn{K} clusters.
   \item classification - The classification corresponding to z, i.e. map(z).
   \item uncertainty - The uncertainty of each CpG site's clustering.
   \item thresholds - Threshold points calculated under the K.. or the KN. model.
   }
}
}
\description{
A family of model-based clustering techniques to identify methylation states in beta-valued DNA methylation data.
}
\details{
This is a wrapper function which can be used to fit all three models (K.., KN., K.R) within a single function.

The K.. and KN. models are used to analyse a single DNA sample (\eqn{R = 1}) and cluster the \eqn{C} CpG sites into the \eqn{K} clusters which represent the different methylation states in a DNA sample. As each CpG site can belong to any of the \eqn{M=3} methylation states (hypomethylation, hemimethylation and hypermethylation), the default value for \eqn{K=M=3}.
The thresholds between methylation states are objectively inferred from the clustering solution.

The K.R model is used to analyse \eqn{R} independent samples collected from \eqn{N} patients, where each sample contains \eqn{C} CpG sites, and cluster
the dataset into \eqn{K=M^R} clusters to identify the differentially methylated CpG (DMC) sites between the \eqn{R} DNA samples.
}
\examples{
\donttest{
my.seed <- 190
M <- 3
N <- 4
R <- 2
data_output <- betaclust(pca.methylation.data[1:30,2:9], M, N, R,
            model_names = c("K..","KN.","K.R"), model_selection = "BIC",
            parallel_process = FALSE, seed = my.seed)

}
}
\references{
{Silva, R., Moran, B., Russell, N.M., Fahey, C., Vlajnic, T., Manecksha, R.P., Finn, S.P., Brennan, D.J., Gallagher, W.M., Perry, A.S.: Evaluating liquid biopsies for methylomic profiling of prostate cancer. Epigenetics 15(6-7), 715-727 (2020). \doi{10.1080/15592294.2020.1712876}.}

{Majumdar, K., Silva, R., Perry, A.S., Watson, R.W., Murphy, T.B., Gormley, I.C.: betaclust: a family of mixture models for beta valued DNA methylation data. arXiv [stat.ME] (2022). \doi{10.48550/ARXIV.2211.01938}.}
}
\seealso{
\code{\link{beta_k}}

\code{\link{beta_kn}}

\code{\link{beta_kr}}

\code{\link{pca.methylation.data}}

\code{\link{plot.betaclust}}

\code{\link{summary.betaclust}}

\code{\link{threshold}}
}
