\name{RankAggreg}
\alias{RankAggreg}
%- Also NEED an '\alias' for EACH other topic documented here.
\title{Weighted Rank Aggregation of partial ordered lists}
\description{
 Performs aggregation of ordered lists based on the ranks (optinally with additional
 weights) via the Cross-Entropy Monte Carlo algorithm or the Genetic Algorithm.
}
\usage{
RankAggreg(x, k, index.weights = NULL, use.weights = FALSE, method = c("CrossEntropy", "GeneticAlgorithm"), 
distance = c("Spearman", "Kendall"), rho = 0.01, weight = 0.5, N = 5 * k * length(unique(sort(as.vector(x)))), 
error = 0.001, maxIter = 100, popSize = 100, CP = 1, MP = 0.001, informative = FALSE, v1 = NULL, verbose = TRUE)
}
%- maybe also 'usage' for other objects documented here.
\arguments{
  \item{x}{a matrix of ordered lists to be combined (lists must be in rows)}
  \item{k}{size of the top-k list}
  \item{index.weights}{a matrix of scores (weights) to be used in the aggregation process. Weights in 
  each row must be ordered either in decreasing or increasing order and must correspond to the elements
  in x}
  \item{use.weights}{boolean, if weights are to be used}
  \item{method}{method to be used to perform rank aggregation: Cross Entropy Monte Carlo or Genetic Algorithm (GA)}
  \item{distance}{distance to be used which "measures" the similarity of ordered lists}
  \item{rho}{(rho*N) is the "quantile" of candidate lists sorted by the function values. Used only by the Cross-Entropy algorithm}
  \item{weight}{a learning factor used in the probability update procedure of the algorithm. Used only by the Cross-Entropy algorithm}
  \item{N}{a number of samples to be generated by the MCMC; default: 5nk, where n is the number of 
  unique elements in x. Used only by the Cross-Entropy algorithm}
  \item{error}{convergence criteria for the Cross-Entropy algorithm}
  \item{maxIter}{the maximum iterations allowed; can be used as a stopping criteria for the Genetic Algorithm}
  \item{popSize}{population size in each generation of Genetic Algorithm; has no effect if method="CrossEntropy"}
  \item{CP}{Cross-over probability for the GA; the default value is 1. It is usually greater than 0.5.}
  \item{MP}{Mutation probability. This value should be small and the number of mutations in the population of size popSize
	and the number of features k is computed as popSize*k*MP. Used only by the GA}
  \item{informative}{boolean, if informative is TRUE, use v1 as the initial probability matrix}
  \item{v1}{optional, can be used to specify the initial probability matrix; if v1=NULL,
    the initial probability matrix is set to 1/n, where n is the number of unique elements in x}
  \item{verbose}{boolean, if console output is to be displayed at each iteration}
}
\details{
  The function performs rank aggregation via the Cross-Entropy Monte Carlo algorithm or the Genetic Algorithm. Both approaches can and 
  should be used when k is relatively large (k > 10). If k is small, one can enumerate all possible
  candidate lists and find the minimum directly using the BruteAggreg function available in this package.
  
  The Cross-Entropy Monte Carlo algorithm is an iterative procedure for solving difficult combinatorial 
  problems in which it is computationally not feasable to find the solution directly. In the context of 
  rank aggregation, the algorithm searches for the "super"-list which is as close as possible to the
  ordered lists in x. We use either the Spearman footrule distance or the Kendall's tau to measure the "closeness" of any two
  ordered lists (or modified by us the weighted versions of these distances). Please refer to the paper 
  in the references for further details.

  The Genetic Algorithm requires setting CP and MP parameters which effect the degree of "evolution" in the population. If both
  CP and MP are small, the algorithms is very conservative and may take a long time to search the solution space of all ordered candidate
  lists. On the other hand, setting CP and MP (especially MP) large will introduce a large number of mutations in the population which
  can result in a local optima. Two convergence criteria are used to stop the algorithm. The first being the repetition of the same minimum value
  of the objective function in five consecutive iterations. If that condition is not met in maxIter number of iterations, the algorithm will stop regardless of the 
  first condition.
}
\value{
  \item{top.list}{Top-k aggregated list}
  \item{optimal.value}{the minimum value of the objective function corresponding to the top-k list}
  \item{sample.size}{the number of samples generated by the MCMC at each iteration}
  \item{num.iter}{the number of iterations until convergence}
  \item{method}{which algorithm was used}
  \item{distance}{which distance was used}
}
\references{Pihur, V., Datta, S., and Datta, S. (2007) "Weighted rank aggregation of cluster validation 
measures: a Monte Carlo cross-entropy approach" Bioinformatics, 23(13):1607-1615 }

\author{Vasyl Pihur, Somnath Datta, Susmita Datta}

\seealso{\code{\link{BruteAggreg}}}

\examples{
# rank aggregation without weights
x <- matrix(c("A", "B", "C", "D", "E",
        "B", "D", "A", "E", "C",
        "B", "A", "E", "C", "D",
        "A", "D", "B", "C", "E"), byrow=TRUE, ncol=5)

toplist <- RankAggreg(x, 5, rho=.1)

# weighted rank aggregation
set.seed(100)
w <- matrix(rnorm(20), ncol=5)
w <- t(apply(w, 1, sort))

# using the Cross-Entropy Monte-Carlo algorithm
toplistS <- RankAggreg(x, 5, rho=.1, index.weights=w, use.weights=TRUE)
toplistK <- RankAggreg(x, 5, rho=.1, index.weights=w, use.weights=TRUE, distance="Kendall")

# using the Genetic algorithm
toplistS <- RankAggreg(x, 5, rho=.1, index.weights=w, use.weights=TRUE, method="Genetic")
toplistK <- RankAggreg(x, 5, index.weights=w, use.weights=TRUE, distance="Kendall", method="Genetic")
}

% Add one or more standard keywords, see file 'KEYWORDS' in the
% R documentation directory.
\keyword{optimize}
\keyword{robust}% __ONLY ONE__ keyword per line
