% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/play_file.R
\name{sim_data_nll_basic}
\alias{sim_data_nll_basic}
\title{Function simulating survival data for nll_basic}
\usage{
sim_data_nll_basic(
  a1 = a1,
  b1 = b1,
  n1 = n1,
  a2 = a2,
  b2 = b2,
  n2 = n2,
  t_censor = 1000,
  d1 = "",
  d2 = ""
)
}
\arguments{
\item{a1, b1, a2, b2}{location and scale parameters describing background
mortality and mortality due to infection, respectively; values must be > 0}

\item{n1, n2}{size of populations to simulate data; uninfected and infected,
respectively; values must be > 0}

\item{t_censor}{time when any individuals still alive are right-censored;
defaults to 1000 if not specified, must be > 0}

\item{d1, d2}{probability distribution(s) chosen to describe background
mortality and mortality due to infection, respectively;
choice of, Weibull, Gumbel, Fréchet}
}
\description{
Function simulating survival data corresponding with assumptions of nll_basic
}
\details{
To generate data, the function collects the input variables for the
probability distributions chosen to describe the background mortality and
mortality due to infection, along with values for their location and scale
parameters, respectively.
These are used to parameterise cumulative survival functions for the two
sources of mortality. These functions are solved for time t, where the
value of S(t) is drawn from a random uniform distribution between 0 and 1;
this yields values of t corresponding with values of survival
drawn at random from the cumulative survival curve.

Values of t are rounded up to the nearest integer.
This introduces a bias as recorded times of death are later than actual
times of death;
this bias occurs in real datasets when sampling occurs at discrete intervals.
This bias can be partially offset by taking the mid-point of sampling
intervals as the time of death.
Mid-point times of death are also provided, assuming sampling occurs at
each integer interval.

In the case of hosts from infected treatments, two potential times of
death are calculated;
that due to background mortality and that due to infection.
The earlier of the two defines the time of death, providing it is not
later than the value of t_censor.

The value of t_censor defines the time when any individuals remaining alive
are right-censored.
The act of censoring is assumed to occur after populations have been
sampled for mortality, i.e.,
if mortality is recorded daily and the last day of an experiment is day 14,
the populations are checked for mortality at the beginning of day 14 and
any individuals alive after this are classed as censored on day 14.
The timing of censoring is 'true' as individuals were known to be alive
at the time they were censored. Hence times of censoring do not vary for
'time' or 'mid-time', whereas they do for individuals dying at an unknown
time between two consecutive sampling times.
}
\section{Warning}{
 The lower tail of Gumbel distribution can yield negative
estimates for times of death as the random variable replacing
cumulative survival approaches one; S(t) -> 1.
}

\examples{
  set.seed(34)

  sim_data <- sim_data_nll_basic(
    a1 = 2.5, b1 = 0.9, n1 = 100, d1 = "Weibull",
    a2 = 2.0, b2 = 0.5, n2 = 100, d2 = "Weibull",
    t_censor = 14)

  head(sim_data, 10)

  sim_data$time[sim_data$infected_treatment == 0]

  sim_data$time[sim_data$infected_treatment == 1]

  # plot histogram for ages at death in infected population
      hist(
        sim_data$time[(sim_data$infected_treatment == 1 & sim_data$censor == 0)],
        breaks = seq(0, 14, 1),
        xlim = c(0, 15),
        main = 'infected, died',
        xlab = 'time of death',
        xaxt = 'n')
      axis(side = 1, tick = TRUE, at = seq(0, 14, 1))

  # estimate location and scale parameters of simulated data with nll_basic
      m01_prep_function <- function(a1, b1, a2, b2){
        nll_basic(a1, b1, a2, b2,
          data = sim_data,
          time = time,
          censor = censor,
          infected_treatment = infected_treatment,
          d1 = 'Weibull', d2 = 'Weibull'
          )}

    m01 <- mle2(m01_prep_function,
             start = list(a1 = 2, b1 = 1, a2 = 2, b2 = 1)
             )

    confint(m01)

  # estimate using 'mid_time' instead of 'time'
      m02_prep_function <- function(a1, b1, a2, b2){
        nll_basic(a1, b1, a2, b2,
          data = sim_data,
          time = mid_time,
          censor = censor,
          infected_treatment = infected_treatment,
          d1 = 'Weibull', d2 = 'Weibull'
          )}

      m02 <- mle2(m02_prep_function,
             start = list(a1 = 2, b1 = 1, a2 = 2, b2 = 1)
             )

    confint(m02)

    # compare estimates
      AICc(m01, m02, nobs = 200)
      # for these data, m02 based on 'mid-time' provides a better
      # description of the data
}
