% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/calc_nreps2.R
\name{calc_nreps2}
\alias{calc_nreps2}
\title{Determine sample sizes for a pair of algorithms on a problem instance}
\usage{
calc_nreps2(instance, algorithm1, algorithm2, se.max, dif, method = "param",
  nstart = 20, nmax = 200, seed = NULL, boot.R = 999,
  force.balanced = FALSE)
}
\arguments{
\item{instance}{a list object containing the definitions of the problem
instance.
See Section \emph{Problems and Algorithms} for details.}

\item{algorithm1}{a list object containing the definitions of algorithm 1.
See Section \emph{Problems and Algorithms} for details.}

\item{algorithm2}{a list object containing the definitions of algorithm 2.
See Section \emph{Problems and Algorithms} for details.}

\item{se.max}{desired upper limit for the standard error of the estimated
difference between the two algorithms. See Section
\emph{Types of Differences} for details.}

\item{dif}{type of difference to be used. Accepts "perc"
(for percent differences) or "simple" (for simple differences)}

\item{method}{method to use for estimating the standard error. Accepts
"param" (for parametric) or "boot" (for bootstrap)}

\item{nstart}{initial number of algorithm runs for each algorithm.
See Section \emph{Initial Number of Observations} for details.}

\item{nmax}{maximum total allowed sample size.}

\item{seed}{seed for the random number generator}

\item{boot.R}{number of bootstrap resamples}

\item{force.balanced}{logical flag to force the use of balanced sampling for
the algorithms on each instance
#@param ncpus number of cores to use (under development.) #//DoParallel}
}
\value{
a list object containing the following items:
\itemize{
\item \code{x1j} - vector of observed performance values for \code{algorithm1}
\item \code{x2j} - vector of observed performance values for \code{algorithm2}
\item \code{phi.est} - estimated value for the statistic of interest
\item \code{se} - standard error of the estimate
\item \code{n1j} - number of observations generated for algorithm 1
\item \code{n2j} - number of observations generated for algorithm 2
\item \code{r.opt = n1j / n2j}
\item \code{seed} - the seed used for the PRNG
\item \code{dif} - the type of difference used
\item \code{method} - the method used ("param" / "boot")
}
}
\description{
Iteratively calculates the required sample sizes for two algorithms
on a given problem instance, so that the standard error
of the estimate of the difference (either simple or percent) in mean
performance is controlled at a predefined level.
}
\section{Instances and Algorithms}{

Parameters \code{instance}, \code{algorithm1} and \code{algorithm2} must each
be a list of instance (algorithm) specifications, defined according to the
instructions given below.

\code{instance} is a named list containing all relevant parameters that
define the problem instance. This list must contain at least the field
\code{instance$FUN}, with the name of the problem instance function, that is, a
routine that calculates y = f(x). If the instance requires additional
parameters, these must also be provided as named fields.

Similarly, \code{algorithm1} and \code{algorithm2} must each be a named list
containing all relevant parameters that define the algorithm to be applied
for solving the problem instance. In what follows we use \code{algorithm} to
refer to both \code{algorithm1} and \code{algorithm2}

\code{algorithm} must contain a \code{algorithm$FUN} field (the name
of the function that calls the algorithm) and any other elements/parameters
that \code{algorithm$FUN} requires (e.g., stop criteria, operator names and
parameters, etc.).

The function defined by the routine \code{algorithm$FUN} must have the
following structure: supposing that the list in \code{algorithm} has
fields \code{algorithm$FUN = myalgo} and
\code{algorithm$par1 = "a", algorithm$par2 = 5}, then:

\preformatted{
         myalgo <- function(par1, par2, instance, ...){
               # do stuff
               # ...
               return(results)
         }
   }

That is, it must be able to run if called as:

\preformatted{
         # remove '$FUN' field from list of arguments
         # and include the problem definition as field 'instance'
         myargs          <- algorithm[names(algorithm) != "FUN"]
         myargs          <- myargs[names(myargs) != "alias"]
         myargs$instance <- instance

         # call function
         do.call(algorithm$FUN,
                 args = myargs)
   }

The \code{algorithm$FUN} routine must return a list containing (at
least) the performance value of the final solution obtained, in a field named
\code{value} (e.g., \code{result$value}) after a given run.
}

\section{Initial Number of Observations}{

In the \strong{general case} the initial number of observations / algorithm /
instance (\code{nstart}) should be relatively high. For the parametric case
we recommend ~20 if outliers are not expected, ~50 (at least) if that
assumption cannot be made. For the bootstrap approach we recommend using at
least 20. However, if some distributional assumptions can be
made - particularly low skewness of the population of algorithm results on
the test instances), then \code{nstart} can in principle be as small as 5 (if the
output of the algorithm were known to be normal, it could be 1).

In general, higher sample sizes are the price to pay for abandoning
distributional assumptions. Use lower values of \code{nstart} with caution.
}

\section{Types of Differences}{

Parameter \code{dif} informs the type of difference in performance to be used
for the estimation (mu1 and mu2 represent the mean performance of each
algorithm on the problem instance):
\itemize{
\item If \code{dif == "perc"} it estimates (mu2 - mu1) / mu1.
\item If \code{dif == "simple"} it estimates mu2 - mu1.
}
}

\examples{
# Uses dummy algorithms and a dummy instance to illustrate the
# use of calc_nreps2
algorithm1 <- list(FUN = "dummyalgo", alias = "algo1",
                   distribution.fun = "rnorm",
                   distribution.pars = list(mean = 10, sd = 1))
algorithm2 <- list(FUN = "dummyalgo", alias = "algo2",
                   distribution.fun = "rnorm",
                   distribution.pars = list(mean = 20, sd = 4))
instance <- list(FUN = "dummyinstance")

# Theoretical results for an SE = 0.5 on the simple difference:
# phi = 10; n1 = 20; n2 = 80
# (using the parametric approach)
my.reps  <- calc_nreps2(instance, algorithm1, algorithm2,
                        se.max = 0.5, dif = "simple", seed = 1234)
cat("n1j   =", my.reps$n1j, "\\nn2j   =", my.reps$n2j,
    "\\nphi_j =", my.reps$phi.est, "\\nse    =", my.reps$se)

# Forcing equal sample sizes:
my.reps  <- calc_nreps2(instance, algorithm1, algorithm2,
                        se.max = 0.5, dif = "simple", seed = 1234,
                        force.balanced = TRUE)
cat("n1j   =", my.reps$n1j, "\\nn2j   =", my.reps$n2j,
    "\\nphi_j =", my.reps$phi.est, "\\nse    =", my.reps$se)

\dontrun{
# Using the bootstrap approach
algorithm3 <- list(FUN = "dummyalgo", alias = "algo3",
                   distribution.fun = "rchisq",
                   distribution.pars = list(df = 2, ncp = 3))

my.reps  <- calc_nreps2(instance, algorithm1, algorithm3,
                        se.max = 0.05, dif = "perc",
                        method = "boot", seed = 1234,
                        nstart = 20)
cat("n1j   =", my.reps$n1j, "\\nn2j   =", my.reps$n2j,
    "\\nphi_j =", my.reps$phi.est, "\\nse    =", my.reps$se)
}

}
\references{
\itemize{
\item F. Campelo, F. Takahashi:
Sample size estimation for power and accuracy in the experimental
comparison of algorithms (submitted, 2017).
\item P. Mathews.
Sample size calculations: Practical methods for engineers and scientists.
Mathews Malnar and Bailey, 2010.
\item A.C. Davison, D.V. Hinkley:
Bootstrap methods and their application. Cambridge University Press (1997)
\item E.C. Fieller:
Some problems in interval estimation. Journal of the Royal Statistical
Society. Series B (Methodological) 16(2), 175–185 (1954)
\item V. Franz:
Ratios: A short guide to confidence limits and proper use (2007).
https://arxiv.org/pdf/0710.2024v1.pdf
\item D.C. Montgomery, C.G. Runger:
Applied Statistics and Probability for Engineers, 6th ed. Wiley (2013)
}
}
\author{
Felipe Campelo (\email{fcampelo@ufmg.br}),
Fernanda Takahashi (\email{fernandact@ufmg.br})
}
