#' Calculate Signorino and Ritter's (1999) S for Similarity
#'
#' @description
#'
#' \code{srs()} takes two vectors and returns Signorino and Ritter's S statistic
#' communicating broadly understood "similarity" of interests or ratings.
#'
#' @return
#'
#' \code{srs()} takes two vectors and returns Signorino and Ritter's S statistic
#' communicating broadly understood "similarity" of interests or ratings.
#'
#' @details
#'
#' Be advised that Signorino and Ritter's (1999) treatment of the S statistic
#' used absolute distances when squared distances are more commonly used in the
#' world of distance and association metrics.
#'
#' There are potentially instances in which the conceivable range of
#' ratings/attachments (i.e. your two vectors) are not observed. In the case of
#' applications to alliance data, this is almost an impossibility. Every state,
#' by assumption, is maximally committed to defending itself. There will
#' assuredly be cases in which there is no commitment to another state in the
#' data (either for reasons of disinterest or enmity, though the first calls
#' into question what a 0 should communicate and the latter betrays the
#' interesting complexity of alliances). Thus, the minimum and maximum, one
#' assumes, will always be observed in the alliance data. Perhaps the same could
#' be said for UN voting data, though I couldn't rule out the possibility that
#' there is a dyad out there for which both states never voted "yes" or "no".
#' That would have implications for the range in the denominator of the formula.
#' You can override that by hard-setting the range in the `range` argument.
#'
#' The function subsets to complete cases of the two vectors for which you want
#' an S score. If weights are included, the function further subsets to complete
#' cases including the weights as well.
#'
#' The function implicitly assumes that `x1` and `x2` are columns in a data
#' frame. One indirect check for this looks at whether `x1` and `x2` are the
#' same length. The function will stop if they're not.
#'
#' ## Several Comments on Weighting
#'
#' If it were my call to make, I'd caution against the IR standard of using
#' the composite index of national capabilities  (CINC) as a weight on the
#' calculation of the S statistic. Conceptually, weighting by capabilities tries
#' to capture some kind of "importance" quantity. Related to the familiar
#' application of alliances, this would prioritize those states that could
#' conceivably bring more to the battlefield. In practice, this adds one
#' anachronism to another. Capabilities, as measured, are basically a nineteenth
#' century measurement for which estimates of energy consumption, iron and steel
#' production, and urban population size are given equal weight in composition
#' of the measure to military expenditures and military size. Alliances
#' themselves are somewhat antiquarian, certainly in what we want them to do for
#' this measure. If the question is "why must alliances be measures of foreign
#' policy similarity", the answer kind of reduces to "we have historical data on
#' them." If you want estimates for the 19th century, you have this, but then
#' are implicitly confessing your measure of foreign policy similarity is an
#' anachronism.
#'
#' There are other peculiarities too. The data on capabilities has always been
#' historically skewed to the right. Very few states have proportionally that
#' much weight. As the state system has expanded in size (i.e. as empires ended),
#' the relative weight at the top necessarily decreases. For example, the top
#' 3 states in capabilities in 1816 (the United Kingdom, Russia, and France)
#' combined for 61.8% of capabilities in a system of just 23 states. In 2016,
#' the top three states (China, the United States, and India) combined for 45%
#' of capabilities in a system of 195 states. New states are almost always small
#' states that possess almost no capabilities. 11 of 23 states in 1816 had less
#' than 1% of capabilities. That's about 48% of the system. In 2016, 176 of 195
#' states have less than 1% of capabilities. That's over 90% of the system. If
#' the idea is to identify the "important" foreign policy ties, I echo Haege's
#' (2011) contention that this approach is a second-best solution. It's
#' second-best to other metrics that better model chance-corrected agreement. It
#' just discards too much information and gives too much weight to great powers
#' and/or states that are conspicuously high on capabilities (e.g. India).
#'
#' Faithfully calculating a weighted S statistic (by system capabilities)
#' requires a weight that sums to 1. In the most literal sense of 1, there is
#' no year in the National Material Capabilities data (v. 2016) in which system
#' capabilities in a given year sum to 1. In almost 60% of cases/years, the
#' discrepancy doesn't look like a rounding error either. In 1860, all
#' capabilities sum to over 1.07! In the context of applications with Correlates
#' of War's CINC scores, you can still use the raw data because the function
#' doesn't assume the weights sum to 1. You'll see how in the denominator of
#' the formula.
#'
#' Weights are only applicable to absolute distances. If you specify a weight
#' variable with `distances = 'squared'`, the function will ignore your weights.
#'
#' In applications to the Correlates of War system, as far as I am aware, there
#' are no CoW states for which there isn't a CINC estimate. If, for some reason,
#' a CINC score (or some other weight) is missing, the cases are dropped
#' *before* weights are applied.
#'
#' If weights are supplied, the weights must match the length of either `x1` or
#' `x2`. The function builds in an implicit assumption that the weights are a
#' column in the data frame you're using.
#'
#'
#'
#' @param x1 a vector, and one assumes an integer
#' @param x2 a vector, and one assumes an integer
#' @param distances the type of distances between ratings/attachments to
#' estimate. Can be either "absolute" or "squared". Defaults to "absolute", but
#' see note in details section.
#' @param weights a vector of weights. Defaults to NULL for creating unweighted
#' S statistics
#' @param range defaults to NULL, but an optional vector that forces the range
#' to be a certain value. If NULL, the function calculates a range based on the
#' maximum and minimum values observed across both `x1` and `x2`. See details
#' section for more.
#'
#' @examples
#'
#' srs(gmyrus14$gmy, gmyrus14$rus, distances = 'absolute')
#' srs(gmyrus14$gmy, gmyrus14$rus, distances = 'squared')
#' srs(gmyrus14$gmy, gmyrus14$rus, distances = 'absolute', weights = gmyrus14$syscap)
#'
#' @references
#'
#' Signorino, Curtis S. and Jeffrey M. Ritter. "Tau-b or Not Tau-B: Measuring
#' the Similarity of Foreign Policy Positions." *International Studies Quarterly*
#' 43(1): 115–44.
#'
#' @importFrom stats complete.cases
#' @export

srs <- function(x1, x2, distances = 'absolute', weights = NULL, range = NULL) {

  if(length(x1) != length(x2)) {
    stop("`x1` and `x2` are not the same length.")
  }

  if (!is.null(weights) && (length(weights) != length(x1) || length(weights) != length(x2))) {
    stop("`weights` must be the same length as `x1` and `x2` if you're going to provide it.")
  }

  # force complete cases, just in case
  if(is.null(weights)) {
  completetf <- complete.cases(x1, x2)

  x1 <- x1[completetf]
  x2 <- x2[completetf]

  } else {

    completetf <- complete.cases(x1, x2, weights)

    x1 <- x1[completetf]
    x2 <- x2[completetf]
    weights <- weights[completetf]

  }

  if(is.null(range)) {

    levs <- sort(unique(c(x1, x2)))
    diff <- max(levs) - min(levs)

  } else {

    diff <- range

  }

  if(distances == 'squared') {

    num <-  sum((x1 - x2)^2)
    denom <- length(x1)*(diff^2)

  } else if(distances == 'absolute') {

    if(is.null(weights)) {

      num <- sum(abs(x1 - x2))
      denom <- length(x1)*diff

    } else {

      num <- sum(abs(x1 - x2)*weights)
      denom <- sum(weights)*diff


    }

  }

  s <- 1 - 2*(num/denom)

  return(s)

}
