% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/MRFcov.R
\name{MRFcov}
\alias{MRFcov}
\title{Markov Random Fields with covariates}
\usage{
MRFcov(data, symmetrise, prep_covariates, n_nodes, n_cores, n_covariates,
  family, bootstrap = FALSE)
}
\arguments{
\item{data}{A \code{dataframe}. The input data where the \code{n_nodes}
left-most variables are variables that are to be represented by nodes in the graph}

\item{symmetrise}{The method to use for symmetrising corresponding parameter estimates
(which are taken from separate regressions). Options are \code{min} (take the coefficient with the
smallest absolute value), \code{max} (take the coefficient with the largest absolute value)
or \code{mean} (take the mean of the two coefficients). Default is \code{mean}}

\item{prep_covariates}{Logical. If \code{TRUE}, covariate columns will be cross-multiplied
with nodes to prep the dataset for MRF models. Note this is only useful when additional
covariates are provided. Therefore, if \code{n_nodes < ncol(data)},
default is \code{TRUE}. Otherwise, default is \code{FALSE}. See
\code{\link{prep_MRF_covariates}} for more information}

\item{n_nodes}{Positive integer. The index of the last column in \code{data}
which is represented by a node in the final graph. Columns with index
greater than n_nodes are taken as covariates. Default is the number of
columns in \code{data}, corresponding to no additional covariates}

\item{n_cores}{Positive integer. The number of cores to spread the job across using
\code{\link[parallel]{makePSOCKcluster}}. Default is 1 (no parallelisation)}

\item{n_covariates}{Positive integer. The number of covariates in \code{data}, before cross-multiplication.
Default is \code{ncol(data) - n_nodes}}

\item{family}{The response type. Responses can be quantitative continuous (\code{family = "gaussian"}),
non-negative counts (\code{family = "poisson"}) or binomial 1s and 0s (\code{family = "binomial"}).
If using (\code{family = "binomial"}), please note that if nodes occur in less than 5 percent
of observations this can make it generally difficult to
estimate occurrence probabilities (on the extreme end, this can result in intercept-only
models being fitted for the nodes in question). The function will issue a warning in this case.
If nodes occur in more than 95 percent of observations, this will return an error as the cross-validation
step will generally be unable to proceed. For \code{family = 'poisson'} models, all returned
coefficients are estimated on the identity scale AFTER using a nonparanormal transformation.
See \code{vignette("Gaussian_Poisson_CRFs")} for details of interpretation}

\item{bootstrap}{Logical. Used by \code{\link{bootstrap_MRF}} to reduce memory usage}
}
\value{
A \code{list} containing:
\itemize{
   \item \code{graph}: Estimated parameter \code{matrix} of pairwise interaction effects
   \item \code{intercepts}: Estimated parameter \code{vector} of node intercepts
   \item \code{indirect_coefs}: \code{list} containing matrices representing
    indirect effects of each covariate on pairwise node interactions
   \item \code{direct_coefs}: \code{matrix} of direct effects of each parameter on
   each outcome node. For \code{family = 'binomial'} models, all coefficients are
   estimated on the logit scale.
   \item \code{param_names}: Character string of covariate parameter names
   \item \code{mod_type}: A character stating the type of model that was fit
   (used in other functions)
   \item \code{mod_family}: A character stating the family of model that was fit
   (used in other functions)
   \item \code{poiss_sc_factors}: A matrix of the estimated negative binomial or
   poisson parameters for each raw  node variable (only returned if \code{family = "poisson"}).
   These are needed for converting coefficients back to their original distribution, and are
   used for prediction purposes only
   }
}
\description{
This function is the workhorse of the \code{MRFcov} package, running
separate penalized regressions for each node to estimate parameters of
Markov Random Fields (MRF) graphs. Covariates can be included
(a class of models known as Conditional Random Fields; CRF), to estimate
how interactions between nodes vary across covariate magnitudes.
}
\details{
Separate penalized regressions are used to approximate
MRF parameters, where the regression for node \code{j} includes an
intercept and coefficients for the abundance (families \code{gaussian} or \code{poisson})
or presence-absence (family \code{binomial}) of all other
nodes (\code{/j}) in \code{data}. If covariates are included, coefficients
are also estimated for the effect of the covariate on \code{j}, and for the
effects of the covariate on interactions between \code{j} and all other nodes
(\code{/j}). Note that interaction coefficients must be estimated between variables that
are on roughly the same scale, as the resulting parameter estimates are
unified into a Markov Random Field using the specified \code{symmetrise} function.
Counts for \code{poisson} variables, which are often not on the same scale,
will therefore be normalised with a nonparanormal transformation
\code{x = qnorm(rank(log2(x + 0.01)) / (length(x) + 1))}. These transformed counts
will be used in a \code{(family = "gaussian")}
model and their respective raw distribution parameters returned so that coefficients
can be back-transformed for interpretation (this back-transformation is
performed automatatically by other functions including \code{\link{predict_MRF}}
and \code{\link{cv_MRF_diag}}). Gaussian variables are not automatically transformed, so
if they cover quite different ranges and scales, then it is recommended to scale them prior to fitting
models. For more information on this process, use
\code{vignette("Gaussian_Poisson_CRFs")}
\cr
\cr
Note that since the number of parameters to estimate in each node-wise regression
quickly increases with increasing numbers of nodes and covariates,
LASSO penalization is used to regularize
regressions. This is done by minimising the cross-validated
mean error for each node separately using \code{\link[glmnet]{cv.glmnet}}. In this way,
we maximise the log-likelihood of each node
separately before unifying the nodes into a graph.
}
\examples{
data("Bird.parasites")
CRFmod <- MRFcov(data = Bird.parasites, n_nodes = 4, family = 'binomial')

}
\references{
Ising, E. (1925). Beitrag zur Theorie des Ferromagnetismus.
Zeitschrift für Physik A Hadrons and Nuclei, 31, 253-258.\cr\cr
Cheng, J., Levina, E., Wang, P. & Zhu, J. (2014).
A sparse Ising model with covariates. (2012). Biometrics, 70, 943-953.\cr\cr
Clark, NJ, Wells, K and Lindberg, O.
Unravelling changing interspecific interactions across environmental gradients
using Markov random fields. (2018). Ecology doi: 10.1002/ecy.2221
\href{http://nicholasjclark.weebly.com/uploads/4/4/9/4/44946407/clark_et_al-2018-ecology.pdf}{Full text here}.\cr\cr
Sutton C, McCallum A. An introduction to conditional random fields.
Foundations and Trends in Machine Learning 4, 267-373.
}
\seealso{
Cheng et al. (2014), Sutton & McCallum (2012) and Clark et al. (2018)
for overviews of Conditional Random Fields. See \code{\link[glmnet]{cv.glmnet}} for
details of cross-validated optimization using LASSO penalty. Worked examples to showcase
this function can be found using \code{vignette("Bird_Parasite_CRF")} and
\code{vignette("Gaussian_Poisson_CRFs")}
}
