% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/survey_statistics.r
\name{survey_var}
\alias{survey_var}
\alias{survey_sd}
\title{Calculate the population variance and its variation using survey methods}
\usage{
survey_var(x, na.rm = FALSE, vartype = c("se", "ci", "var"),
  level = 0.95, df = Inf, .svy = current_svy(), ...)

survey_sd(x, na.rm = FALSE, .svy = current_svy(), ...)
}
\arguments{
\item{x}{A variable or expression, or empty}

\item{na.rm}{A logical value to indicate whether missing values should be dropped}

\item{vartype}{Report variability as one or more of: standard error ("se", default)
or variance ("var") (confidence intervals and coefficient
of variation not available).}

\item{level}{(For vartype = "ci" only) A single number or vector of numbers indicating
the confidence level.}

\item{df}{(For vartype = "ci" only) A numeric value indicating the degrees of freedom
for t-distribution. The default (Inf) is equivalent to using normal
distribution and in case of population variance statistics there is little
reason to use any other values (see \emph{Details}).}

\item{.svy}{A \code{tbl_svy} object. When called from inside a summarize function
the default automatically sets the survey to the current survey.}

\item{...}{Ignored}
}
\description{
Calculate population variance from complex survey data. A wrapper
around \code{\link[survey]{svyvar}}. \code{survey_var} should always be
called from \code{\link{summarise}}.
}
\details{
Be aware that confidence intervals for population variance statistic are
computed by package \emph{survey} using \emph{t} or normal (with df=Inf)
distribution (i.e. symmetric distributions). \strong{This could be a very poor
approximation} if even one of these conditions is met:
\itemize{
  \item{there are few sampling design degrees of freedom,}
  \item{analyzed variable isn't normally distributed,}
  \item{there is huge variation in sampling probabilities of the survey design.}
}
Because of this be very careful using confidence intervals for population variance
statistics especially while performing analysis within subsets of data or using
grouped survey objects.

Sampling distribution of the variance statistic in general is asymmetric
(chi-squared in case of simple random sampling of normally distributed variable)
and if analyzed variable isn't normally distributed or there is huge variation in
sampling probabilities of the survey design (or both) it could converge to
normality only very slowly (with growing number of survey design degrees of
freedom).
}
\examples{
library(survey)
data(api)

dstrata <- apistrat \%>\%
  as_survey_design(strata = stype, weights = pw)

dstrata \%>\%
  summarise(api99_var = survey_var(api99),
            api99_sd = survey_sd(api99))

dstrata \%>\%
  group_by(awards) \%>\%
  summarise(api00_var = survey_var(api00),
            api00_sd = survey_sd(api00))

# standard deviation and variance of the population variance estimator
# are available with vartype argument
# (but not for the population standard deviation estimator)
dstrata \%>\%
  summarise(api99_variance = survey_var(api99, vartype = c("se", "var")))
}
