% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/fays_generalized_replication.R
\name{make_fays_gen_rep_factors}
\alias{make_fays_gen_rep_factors}
\title{Form replication factors using Fay's generalized replication method}
\usage{
make_fays_gen_rep_factors(
  Sigma,
  max_replicates = Matrix::rankMatrix(Sigma) + 4,
  balanced = TRUE
)
}
\arguments{
\item{Sigma}{A quadratic form matrix corresponding to
a target variance estimator. Must be positive semidefinite.}

\item{max_replicates}{The maximum number of replicates to allow.
The function will attempt to create the minimum number of replicates
needed to produce a fully-efficient variance estimator.
If more replicates are needed than \code{max_replicates}, then the full number of replicates
needed will be created, but only a random subsample will be retained.}

\item{balanced}{If \code{balanced=TRUE}, the replicates
will all contribute equally to variance estimates, but
the number of replicates needed may slightly increase.}
}
\value{
A matrix of replicate factors,
with the number of rows matching the number of rows of \code{Sigma}
and the number of columns less than or equal to \code{max_replicates}.
To calculate variance estimates using these factors,
use the overall scale factor given by calling
\code{attr(x, "scale")} on the result.
}
\description{
Generate a matrix of replication factors
using Fay's generalized replication method.
This method yields a fully efficient variance estimator
if a sufficient number of replicates is used.
}
\section{Statistical Details}{

See Fay (1989) for a full explanation of Fay's generalized replication method.
This documentation provides a brief overview.

Let \eqn{\boldsymbol{\Sigma}} be the quadratic form matrix for a target variance estimator,
which is assumed to be positive semidefinite.
Suppose the rank of \eqn{\boldsymbol{\Sigma}} is \eqn{k},
and so \eqn{\boldsymbol{\Sigma}} can be represented by the spectral decomposition
of \eqn{k} eigenvectors and eigenvalues, where the \eqn{r}-th eigenvector and eigenvalue
are denoted \eqn{\mathbf{v}_{(r)}} and \eqn{\lambda_r}, respectively.
\deqn{
\boldsymbol{\Sigma} = \sum_{r=1}^k \lambda_r \mathbf{v}_{(r)} \mathbf{v^{\prime}}_{(r)}
}
If \code{balanced = FALSE}, then we let \eqn{\mathbf{H}} denote an identity matrix
with \eqn{k' = k} rows/columns. If \code{balanced = TRUE}, then we let \eqn{\mathbf{H}} be a Hadamard matrix (with all entries equal to \eqn{1} or \eqn{-1}),
of order \eqn{k^{\prime} \geq k}. Let \eqn{\mathbf{H}_{mr}} denote the entry in row
\eqn{m} and column \eqn{r} of \eqn{\mathbf{H}}.

Then \eqn{k^{\prime}} replicates are formed as follows.
Let \eqn{r} denote a given replicate, with \eqn{r = 1, ..., k^{\prime}},
and let \eqn{c} denote some positive constant (yet to be specified).

The \eqn{r}-th replicate adjustment factor \eqn{\mathbf{f}_{r}} is formed as:
\deqn{
  \mathbf{f}_{r} = 1 + c \sum_{m=1}^k H_{m r} \lambda_{(m)}^{\frac{1}{2}} \mathbf{v}_{(m)}
}

If \code{balanced = FALSE}, then \eqn{c = 1}. If \code{balanced = TRUE},
then \eqn{c = \frac{1}{\sqrt{k^{\prime}}}}.

If any of the replicates
are negative, you can use \code{\link[svrep]{rescale_reps}},
which recalculates the replicate factors with a smaller value of \eqn{c}.

If all \eqn{k^{\prime}} replicates are used, then variance estimates are calculated as:
\deqn{
  v_{rep}\left(\hat{T}_y\right) = \sum_{r=1}^{k^{\prime}}\left(\hat{T}_y^{*(r)}-\hat{T}_y\right)^2
}
For population totals, this replication variance estimator
will \emph{exactly} match the target variance estimator
if the number of replicates \eqn{k^{\prime}} matches the rank of \eqn{\Sigma}.
}

\section{The Number of Replicates}{


If \code{balanced=TRUE}, the number of replicates created
may need to increase slightly.
This is due to the fact that a Hadamard matrix
of order \eqn{k^{\prime} \geq k} is used to balance the replicates,
and it may be necessary to use order \eqn{k^{\prime} > k}.

If the number of replicates \eqn{k^{\prime}} is too large for practical purposes,
then one can simply retain only a random subset of \eqn{R} of the \eqn{k^{\prime}} replicates.
In this case, variances are calculated as follows:
\deqn{
  v_{rep}\left(\hat{T}_y\right) = \frac{k^{\prime}}{R} \sum_{r=1}^{R}\left(\hat{T}_y^{*(r)}-\hat{T}_y\right)^2
}
This is what happens if \code{max_replicates} is less than the
matrix rank of \code{Sigma}: only a random subset
of the created replicates will be retained.

Subsampling replicates is only recommended when
using \code{balanced=TRUE}, since in this case every replicate
contributes equally to variance estimates. If \code{balanced=FALSE},
then randomly subsampling replicates is valid but may
produce large variation in variance estimates since replicates
in that case may vary greatly in their contribution to variance
estimates.
}

\section{Reproducibility}{


If \code{balanced=TRUE}, a Hadamard matrix
is used as described above. The Hadamard matrix is
deterministically created using the function
\code{\link[survey]{hadamard}()} from the 'survey' package.
However, the order of rows/columns is randomly permuted
before forming replicates.

In general, column-ordering of the replicate weights is random.
To ensure exact reproducibility, it is recommended to call
\code{\link[base]{set.seed}()} before using this function.
}

\examples{
\dontrun{
  library(survey)

# Load an example dataset that uses unequal probability sampling ----
  data('election', package = 'survey')

# Create matrix to represent the Horvitz-Thompson estimator as a quadratic form ----
  n <- nrow(election_pps)
  pi <- election_jointprob
  horvitz_thompson_matrix <- matrix(nrow = n, ncol = n)
  for (i in seq_len(n)) {
    for (j in seq_len(n)) {
      horvitz_thompson_matrix[i,j] <- 1 - (pi[i,i] * pi[j,j])/pi[i,j]
    }
  }

  ## Equivalently:

  horvitz_thompson_matrix <- make_quad_form_matrix(
    variance_estimator = "Horvitz-Thompson",
    joint_probs = election_jointprob
  )

# Make generalized replication adjustment factors ----

  adjustment_factors <- make_fays_gen_rep_factors(
    Sigma = horvitz_thompson_matrix,
    max_replicates = 50
  )
  attr(adjustment_factors, 'scale')

# Compute the Horvitz-Thompson estimate and the replication estimate

ht_estimate <- svydesign(data = election_pps, ids = ~ 1,
                         prob = diag(election_jointprob),
                         pps = ppsmat(election_jointprob)) |>
  svytotal(x = ~ Kerry)

rep_estimate <- svrepdesign(
  data = election_pps,
  weights = ~ wt,
  repweights = adjustment_factors,
  combined.weights = FALSE,
  scale = attr(adjustment_factors, 'scale'),
  rscales = rep(1, times = ncol(adjustment_factors)),
  type = "other",
  mse = TRUE
) |>
  svytotal(x = ~ Kerry)

SE(rep_estimate)
SE(ht_estimate)
SE(rep_estimate) / SE(ht_estimate)
}
}
\references{
Fay, Robert. 1989.
"Theory And Application Of Replicate Weighting For Variance Calculations."
In, 495–500. Alexandria, VA: American Statistical Association.
http://www.asasrms.org/Proceedings/papers/1989_033.pdf
}
\seealso{
Use \code{\link[svrep]{rescale_reps}} to eliminate negative adjustment factors.
}
