% Generated by roxygen2 (4.1.1): do not edit by hand
% Please edit documentation in R/datasim.R
\name{datasim}
\alias{datasim}
\title{Simulate data including multiple outcomes from error-prone diagnostic tests
or self-reports}
\usage{
datasim(N, blambda, testtimes, sensitivity, specificity, betas = NULL,
  twogroup = NULL, pmiss = 0, pcensor = 0, design = "MCAR",
  negpred = 1, time.varying = F)
}
\arguments{
\item{N}{total number of subjects to be simulated}

\item{blambda}{baseline hazard rate}

\item{testtimes}{a vector of pre-scheduled test times}

\item{sensitivity}{the sensitivity of test}

\item{specificity}{the specificity of test}

\item{betas}{a vector of regression coefficients of the same length as the
covariate vector. If betas = NULL then the simulated dataset corresponds to
the one sample setting. If betas != NULL and twogroup != NULL then the
simulated dataset corresponds to the two group setting, and the first value
of betas is used as the coefficient for the treatment group indicator. If
betas != NULL and twogroup = NULL, then the covariates are ~ i.i.d. N(0,
1), and the number of covariates is determined by the length of betas.}

\item{twogroup}{corresponds to the proportion of subjects allocated to the
baseline (reference) group in the two-group setting. For the two-group
setting, this variable should be between 0 and 1. For the one sample and
multiple (>= 2) covariate setting, this variable should be set to NULL.
That is, when betas !=NULL, set twogroup to equal the proportion of the
subjects in the baseline group to obtain a simulated dataset corresponding
to the two-group setting. Else, set twogroup=NULL to obtain either the one
sample setting (betas=NULL) or continuous covariates (betas !=NULL).}

\item{pmiss}{a value or a vector (must have same length as testtimes) of the
probabilities of each test being randomly missing at each test time. If
pmiss is a single value, then each test is assumed to have an identical
probability of missingness.}

\item{pcensor}{a value or a vector (must have same length as testtimes) of
the probability of censoring at each visit, assuming censoring process
is independent on other missing mechanisms.}

\item{design}{missing mechanism: "MCAR" or "NTFP"}

\item{negpred}{baseline negative predictive value, i.e. the probability of being
truely disease free for those who were tested (reported) as disease free at
baseline. If baseline screening test is perfect, then negpred = 1.}

\item{time.varying}{indicator whether fitting a time varying covariate model
  or not}
}
\value{
simulated longitudinal form data frame
}
\description{
This function simulates a data of N subjects with misclassified
  outcomes, assuming each subject receives a sequence of pre-scheduled tests
  for disease status ascertainment. Each test is subject to error,
  characterized by sensitivity and specificity. An exponential distribution
  is assumed for the time to event of interest. Three kinds of covariate
  settings can be generated: one sample setting, two group setting, and
  continuous covariates setting with each covariate sampled from i.i.d. N(0,
  1). Two missing mechanisms can be assumed, namely MCAR and NTFP. The MCAR
  setting assumes that each test is subject to a constant, independent
  probability of missingness. The NTFP mechanism includes two types of
  missingness - (1) incorporates a constant, independent, probability of
  missing for each test prior to the first positive test result; and (2) all
  test results after first positive are missing. The simulated data is in
  longitudinal form with one row per test time.

  Covariate values, by default, are assumed to be constant. However, this
  function can simulate a special case of time varying covariates. Under time
  varying covariates setting, each subject is assumed to have a change time
  point, which is sampled from the visit times. We assume that each subject
  has two sets of covariate values. Before his change time point, the
  covariate values take from the first set, and second set after change time
  point. Thus, each subject's distribution of survival time is two-piece
  exponential distribution with different hazard rates.
}
\details{
To simulate the one sample setting data, set betas to be NULL. To
  simulate the two group setting data, set twogroup to equal the proportion
  of the subjects in the baseline group and set betas to equal the
  coefficient corresponding to the treatment group indicator(i.e. beta equals
  the log hazard ratio of the two groups). To simulate data with continuous
  i.i.d. N(0, 1) covariates, set twogroup to be NULL and set betas to equal
  the vector of coefficients of the covariates.
}
\examples{
## One sample setting
simdata1 <- datasim(N = 1000, blambda = 0.05, testtimes = 1:8, sensitivity = 0.7,
  specificity = 0.98, betas = NULL, twogroup = NULL, pmiss = 0.3, design = "MCAR")

## Two group setting, and the two groups have same sample sizes
simdata2 <- datasim(N = 1000, blambda = 0.05, testtimes = 1:8, sensitivity = 0.7,
  specificity = 0.98, betas = 0.7, twogroup = 0.5, pmiss = 0.3, design = "MCAR")

## Three covariates with coefficients 0.5, 0.8, and 1.0
simdata3 <- datasim(N = 1000, blambda = 0.05, testtimes = 1:8, sensitivity = 0.7,
  specificity = 0.98, betas = c(0.5, 0.8, 1.0), twogroup = NULL, pmiss = 0.3,
  design = "MCAR", negpred = 1)

## NTFP missing mechanism
simdata4 <- datasim(N = 1000, blambda = 0.05, testtimes = 1:8, sensitivity = 0.7,
  specificity = 0.98, betas = c(0.5, 0.8, 1.0), twogroup = NULL, pmiss = 0.3,
  design = "NTFP", negpred = 1)

## Baseline misclassification
simdata5 <- datasim(N = 2000, blambda = 0.05, testtimes = 1:8, sensitivity = 0.7,
  specificity = 0.98, betas = c(0.5, 0.8, 1.0), twogroup = NULL, pmiss = 0.3,
  design = "MCAR", negpred = 0.97)

## Time varying covariates
simdata6 <- datasim(N = 1000, blambda = 0.05, testtimes = 1:8, sensitivity = 0.7,
  specificity = 0.98, betas = c(0.5, 0.8, 1.0), twogroup = NULL, pmiss = 0.3,
  design = "MCAR", negpred = 1, time.varying = TRUE)
}

