% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/correctHeap.R
\name{correctHeaps}
\alias{correctHeaps}
\title{Correct  age heaping}
\usage{
correctHeaps(x, heaps = "10year", method = "lnorm", start = 0,
  fixed = NULL)
}
\arguments{
\item{x}{numeric vector}

\item{heaps}{\itemize{
\item \code{5year}: heaps are assumed to be every 5 years (0,5,10,...)
\item \code{10year}: heaps are assumed to be every 10 years (0,10,20,...)
}}

\item{method}{a character specifying the algorithm used to correct the age
heaps. Allowed values are
\itemize{
\item \code{lnorm}: drawing from a truncated log-normal distribution. The
required parameters are estimated using original input data.
\item \code{norm}: drawing from a truncated normal distribution. The
required parameters are estimated using original input data.
\item \code{unif}: random sampling from a (truncated) uniform distribution
}}

\item{start}{a numeric value for the starting of the 5 or 10 year sequences
(e.g. 0, 5 or 10)}

\item{fixed}{numeric index vector with observation that should not be changed}
}
\value{
a numeric vector without age heaps
}
\description{
Correct for age heaping using truncated (log-)normal distributions
}
\details{
Age heaping can cause substantial bias
in important measures and thus age heaping should be corrected. 

For method \dQuote{lnorm}, a truncated log-normal is fit to the whole age distribution. 
Then for each age heap (at 0, 5, 10, 15, ...) random numbers of a truncated 
log-normal (with lower and upper bound) is drawn in the 
interval +- 2 around the heap (rounding of degree 2) 
using the inverse transformation method. A ratio of randomly 
chosen observations on an age heap are replaced 
by these random draws. For the ratio the age distribution is chosen, whereas 
on an age heap (e.g. 5)
the arithmetic means of the two neighboring ages are calculated 
(average counts on age 4 and age 6 for age heap equals 5, for example).
The ratio on, e.g. age equals 5 is then given by the count on age 5 divided by this mean
This is done for any age heap at (0, 5, 10, 15, ...). 

Method \dQuote{norm} replace the draws from truncated log-normals to draws from 
truncated normals. It depends on the age distrubution (if right-skewed or not) if method
\dQuote{lnorm} or \dQuote{norm} should be used. Many distributions with heaping problems
are right-skewed.

Method \dQuote{unif} draws the mentioned ratio of observations on truncated uniform distributions
around the age heaps.

Repeated calls of this function mimics multiple imputation, i.e. repeating this 
procedure m times provides m imputed datasets that properly reflect the 
uncertainty from imputation.
}
\examples{
## create some artificial data
age <- rlnorm(10000, meanlog=2.466869, sdlog=1.652772)
age <- round(age[age < 93])
barplot(table(age))

## artificially introduce age heaping and correct it:
# heaps every 5 years
year5 <- seq(0, max(age), 5)
age5 <- sample(c(age, age[age \%in\% year5]))
cc5 <- rep("darkgrey", length(unique(age)))
cc5[year5+1] <- "yellow"
barplot(table(age5), col=cc5)
barplot(table(correctHeaps(age5, heaps="5year", method="lnorm")), col=cc5)

# heaps every 10 years
year10 <- seq(0, max(age), 10)
age10 <- sample(c(age, age[age \%in\% year10]))
cc10 <- rep("darkgrey", length(unique(age)))
cc10[year10+1] <- "yellow"
barplot(table(age10), col=cc10)
barplot(table(correctHeaps(age10, heaps="10year", method="lnorm")), col=cc10)

# the first 5 observations should be unchanged
barplot(table(correctHeaps(age10, heaps="10year", method="lnorm", fixed=1:5)), col=cc10)

}
\references{
M. Templ, B. Meindl, A. Kowarik, A. Alfons, O. Dupriez (2017) Simulation of Synthetic Populations for Survey Data Considering Auxiliary
Information. \emph{Journal of Statistical Survey}, \strong{79} (10), 1--38. doi: 10.18637/jss.v079.i10
}
\author{
Matthias Templ, Bernhard Meindl, Alexander Kowarik
}
