% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/popkin.R
\name{popkin}
\alias{popkin}
\title{Estimate kinship from a genotype matrix and subpopulation assignments}
\usage{
popkin(
  X,
  subpops = NULL,
  n = NA,
  loci_on_cols = FALSE,
  mem_factor = 0.7,
  mem_lim = NA,
  want_M = FALSE,
  m_chunk_max = 1000
)
}
\arguments{
\item{X}{Genotype matrix, BEDMatrix object, or a function \code{X(m)} that returns the genotypes of all individuals at \code{m} successive locus blocks each time it is called, and \code{NULL} when no loci are left.
If a regular matrix, \code{X} must have values only in \code{c(0, 1, 2, NA)}, encoded to count the number of reference alleles at the locus, or \code{NA} for missing data.}

\item{subpops}{The length-\code{n} vector of subpopulation assignments for each individual.
If \code{NULL}, every individual is effectively treated as a different population.}

\item{n}{Number of individuals (required only when \code{X} is a function, ignored otherwise).
If \code{n} is missing but \code{subpops} is not, \code{n} is taken to be the length of \code{subpops}.}

\item{loci_on_cols}{If \code{TRUE}, \code{X} has loci on columns and individuals on rows; if \code{FALSE} (default), loci are on rows and individuals on columns.
Has no effect if \code{X} is a function.
If \code{X} is a BEDMatrix object, \code{loci_on_cols} is ignored (set automatically to \code{TRUE} internally).}

\item{mem_factor}{Proportion of available memory to use loading and processing genotypes.
Ignored if \code{mem_lim} is not \code{NA}.}

\item{mem_lim}{Memory limit in GB, used to break up genotype data into chunks for very large datasets.
Note memory usage is somewhat underestimated and is not controlled strictly.
Default in Linux and Windows is \code{mem_factor} times the free system memory, otherwise it is 1GB (OSX and other systems).}

\item{want_M}{If \code{TRUE}, includes the matrix \code{M} of non-missing pair counts in the return value, which are sample sizes that can be useful in modeling the variance of estimates.
Default \code{FALSE} is to return the kinship matrix only.}

\item{m_chunk_max}{Sets the maximum number of loci to process at the time.
Actual number of loci loaded may be lower if memory is limiting.}
}
\value{
If \code{want_M = FALSE}, returns the estimated \code{n}-by-\code{n} kinship matrix only.
If \code{X} has names for the individuals, they will be copied to the rows and columns of this kinship matrix.
If \code{want_M = TRUE}, a named list is returned, containing:
\itemize{
\item \code{kinship}: the estimated \code{n}-by-\code{n} kinship matrix
\item \code{M}: the \code{n}-by-\code{n} matrix of non-missing pair counts (see \code{want_M} option).
}
}
\description{
Given the biallelic genotypes of \code{n} individuals, this function returns the \code{n}-by-\code{n} kinship matrix such that the kinship estimate between the most distant subpopulations is zero on average (this sets the ancestral population to the most recent common ancestor population).
}
\details{
The subpopulation assignments are only used to estimate the baseline kinship (the zero value).
If the user wants to re-estimate the kinship matrix using different subpopulation labels,
it suffices to rescale it using \code{\link[=rescale_popkin]{rescale_popkin()}}
(as opposed to starting from the genotypes again, which gives the same answer but more slowly).
}
\examples{
# Construct toy data
X <- matrix(
    c(0, 1, 2,
      1, 0, 1,
      1, 0, 2),
    nrow = 3,
    byrow = TRUE
) # genotype matrix
subpops <- c(1,1,2) # subpopulation assignments for individuals

# NOTE: for BED-formatted input, use BEDMatrix!
# "file" is path to BED file (excluding .bed extension)
## library(BEDMatrix)
## X <- BEDMatrix(file) # load genotype matrix object

kinship <- popkin(X, subpops) # calculate kinship from genotypes and subpopulation labels

}
\seealso{
\code{\link[=popkin_af]{popkin_af()}} for coancestry estimation from allele frequency matrices.
}
