% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/rem_onemoderiskset.R
\name{processOMEventSeq}
\alias{processOMEventSeq}
\title{Process and Create Risk Sets for a One-Mode Relational Event Sequence}
\usage{
processOMEventSeq(
  data,
  time,
  eventID,
  sender,
  receiver,
  p_samplingobserved = 1,
  n_controls,
  time_dependent = FALSE,
  timeDV = NULL,
  timeDif = NULL,
  seed = 9999
)
}
\arguments{
\item{data}{The full relational event sequence dataset.}

\item{time}{The vector of event time values from the observed event sequence.}

\item{eventID}{The vector of event IDs from the observed event sequence (typically a numerical event sequence that goes from 1 to \emph{n}).}

\item{sender}{The vector of event senders from the observed event sequence.}

\item{receiver}{The vector of event receivers from the observed event sequence.}

\item{p_samplingobserved}{The numerical value for the probability of selection for sampling from the observed event sequence. Set to 1 by default indicating that all observed events from the event sequence will be included in the post-processing event sequence.}

\item{n_controls}{The numerical value for the number of null event controls for each (sampled) observed event.}

\item{time_dependent}{TRUE/FALSE. TRUE indicates that a time- or event-dependent dynamic risk set will be created in which only actors involved in a user-specified relationally relevant (time or event) span (i.e., the ‘stretch’ of relational relevancy, such as one month for a time-dependent risk set or 100 events for an event-dependent risk set) are included in the potential risk set. FALSE indicates the complete set of actors involved in past events will be included in the risk set (see the details section). Set to FALSE by default.}

\item{timeDV}{If time_dependent = TRUE, the vector of event time values that corresponds to the creation of the time- \emph{or} event-dependent dynamic risk set (see the details section). \emph{This may or may not be the same vector provided to the time argument}. The \emph{timeDV} vector can be the same vector provided to the \emph{time} argument, in which the relational time span will be based on the event timing within the dataset. In contrast, the \emph{timeDV} vector can also be the vector of numerical event IDs which correspond to the number sequence of events. Moreover, the \emph{timeDV} can also be another measurement that is not the \emph{time} argument or a numerical event ID sequence, such as the number of days, months, years, etc. since the first event.}

\item{timeDif}{If time_dependent = TRUE, the numerical value that represents the time or event span for the creation of the risk set (see the details section). This argument must be in the same measurement unit as the \code{timeDV} argument. For instance, in an event-dependent dynamic risk set, if \code{timeDV} is the number of events since the first event (i.e., a numerical event ID sequence) and only those actors involved in the past, say, 100 events, are considered relationally relevant for the creation of the null events for the current observed event, then \code{timeDIF} should be set to 100. In the time-dependent dynamic risk set case, let’s say that only those actors involved in events that occurred in the past month are considered relationally relevant for the risk set. Let’s also assume that the \code{timeDV} vector is measured in the number of days since the first event. Then \code{timeDif} should be set to 30 in this particular case.}

\item{seed}{The random number seed for user replication.}
}
\value{
A post-processing data table with the following columns:
\itemize{
\item \code{sender} - The event senders of the sampled and observed events.
\item \code{receiver} - The event targets (receivers) of the sampled and observed events.
\item \code{time} - The event time for the sampled and observed events.
\item \code{sequenceID} - The numerical event sequence ID for the sampled and observed events.
\item \code{observed} - Boolean indicating if the event is a sampled event or observed event. (1 = observed; 0 = sampled)
}
}
\description{
This function creates a one-mode post-sampling eventset with options for case-control
sampling (Vu et al. 2015), sampling from the observed event sequence (Lerner and Lomi 2020), and time- or event-dependent
risk sets. Case-control sampling samples an arbitrary \emph{m} number of controls from the risk set for any event
(Vu et al. 2015). Lerner and Lomi (2020) proposed sampling from the observed event sequence
where observed events are sampled with probability \emph{p}. The time- and event-dependent risk sets generate risk sets where the
potential null events are based upon a specified past relational time window, such as events that have occurred in the past year.
Importantly, this function creates risk sets based upon the assumption that only actors active in past events are
in relevant for the creation of the risk set. Users interested in generating risk sets that assume all actors
active at any time point within the event sequence are in the risk set at every time point should consult the
\code{\link[rem]{createRemDataset}} and  \code{\link[remify]{remify}} functions. Future versions of this package will
incorporate this option into the function.
}
\details{
This function processes observed events from the set \eqn{E}, where each event \eqn{e_i} is
defined as:
\deqn{e_{i} \in E = (s_i, r_i, t_i, G[E;t])}
where:
\itemize{
\item \eqn{s_i} is the sender of the event.
\item \eqn{r_i} is the receiver of the event.
\item \eqn{t_i} represents the time of the event.
\item \eqn{G[E;t] = \{e_1, e_2, \ldots, e_{t'} \mid t' < t\}} is the network of past events, that is, all events that occurred prior to the current event, \eqn{e_i}.
}

Following Butts (2008) and Butts and Marcum (2017), we define the risk (support)
set of all possible  events at time \eqn{t}, \eqn{A_t}, as the full Cartesian
product of prior senders and receivers in the set \eqn{G[E;t]} that could have
occurred at time \eqn{t}. Formally:
\deqn{A_t = \{ (s, r) \mid s \in G[E;t] \text{ X } r \in G[E;t] \}}
where \eqn{G[E;t]} is the set of events up to time \eqn{t}.

Case-control sampling maintains the full set of observed events, that is, all events in \eqn{E}, and
samples an arbitrary number \eqn{m} of non-events from the support set \eqn{A_t} (Vu et al. 2015; Lerner
and Lomi 2020). This process generates a new support set, \eqn{SA_t}, for any relational event
\eqn{e_i} contained in \eqn{E} given a network of past events \eqn{G[E;t]}. \eqn{SA_t} is formally defined as:
\deqn{SA_t \subseteq \{ (s, r) \mid s \in G[E;t] \text{ X } r \in G[E;t] \}}
and in the process of sampling from the observed events, \eqn{n} number of observed events are
sampled from the set \eqn{E} with known probability \eqn{0 < p \le 1}. More formally, sampling from
the observed set generates a new set \eqn{SE \subseteq E}.

A time \emph{or} event-dependent dynamic risk set can be created where the set of potential events,
that is, all events in the risk set, At, is based only on the set of actors active in a
specified event or time span from the current event (e.g., such as within the past month
or within the past 100 events). In other words, the specified event or time span can be
based on either: a) a specified time span based upon the actual timing of the past events
(e.g., years, months, days or even milliseconds as in the case of Lerner and Lomi 2020),
or b) a specified number of events based on the ordering of the past events (e.g., such
as all actors involved in the past 100 events). Thus, if time- or event-dependent dynamic
risk sets are desired, the user should set time_dependent to TRUE, and then specify the
accompanying time vector, \code{timeDV}, defined as the number of time units (e.g., days) or the
number of events since the first event. Moreover, the user should also specify the cutoff
threshold with the \code{timeDif} value that corresponds directly to the measurement unit of
\code{timeDV} (e.g., days). For example, let’s say you wanted to create a time-dependent dynamic
risk set that only includes actors active within the past month, then you should create a
vector of values \code{timeDV}, which for each event represents the number of days since the first
event, and then specify \code{timeDif} to 30. Similarly, let’s say you wanted to create an event-dependent
dynamic risk set that only includes actors involved in the past 100 events, then you should create
a vector of values \code{timeDV}, that is, the counts of events since the first event (e.g., 1:n), and
then specify \code{timeDif} to 100.
}
\examples{
# A random one-mode relational event sequence
set.seed(9999)
events <- data.frame(time = sort(rexp(1:18)),
                                eventID = 1:18,
                                sender = c("A", "B", "C",
                                           "A", "D", "E",
                                           "F", "B", "A",
                                           "F", "D", "B",
                                           "G", "B", "D",
                                          "H", "A", "D"),
                               target = c("B", "C", "D",
                                          "E", "A", "F",
                                          "D", "A", "C",
                                          "G", "B", "C",
                                          "H", "J", "A",
                                          "F", "C", "B"))

# Creating a one-mode relational risk set with p = 1.00 (all true events)
# and 5 controls
eventSet <- processOMEventSeq(data = events,
                      time = events$time,
                      eventID = events$eventID,
                      sender = events$sender,
                      receiver = events$target,
                      p_samplingobserved = 1.00,
                      n_controls = 5,
                      seed = 9999)

# Creating a event-dependent one-mode relational risk set with p = 1.00 (all
# true events) and 3 controls based upon the past 5 events prior to the current event.
events$timeseq <- 1:nrow(events)
eventSetT <- processOMEventSeq(data = events,
                       time = events$time,
                       eventID = events$eventID,
                       sender = events$sender,
                       receiver = events$target,
                       p_samplingobserved = 1.00,
                       time_dependent = TRUE,
                       timeDV = events$timeseq,
                       timeDif = 5,
                       n_controls = 3,
                       seed = 9999)

# Creating a time-dependent one-mode relational risk set with p = 1.00 (all
# true events) and 3 controls based upon the past 0.40 time units.
eventSetT <- processOMEventSeq(data = events,
                       time = events$time,
                       eventID = events$eventID,
                       sender = events$sender,
                       receiver = events$target,
                       p_samplingobserved = 1.00,
                       time_dependent = TRUE,
                       timeDV = events$time, #the original time variable
                       timeDif = 0.40, #time difference of 0.40 units
                       n_controls = 3,
                       seed = 9999)
}
\references{
Butts, Carter T. 2008. "A Relational Event Framework for Social Action." \emph{Sociological Methodology} 38(1): 155-200.

Butts, Carter T. and Christopher Steven Marcum. 2017. "A Relational Event Approach to Modeling Behavioral Dynamics." In A.
Pilny & M. S. Poole (Eds.), \emph{Group processes: Data-driven computational approaches}. Springer International Publishing.

Lerner, Jürgen and Alessandro Lomi. 2020. "Reliability of relational event model estimates under sampling: How to
fit a relational event model to 360 million dyadic events." \emph{Network Science} 8(1): 97–135.

Vu, Duy, Philippa Pattison, and Garry Robins. 2015. "Relational event models for social learning in MOOCs." \emph{Social Networks} 43: 121-135.
}
\author{
Kevin A. Carson \href{mailto:kacarson@arizona.edu}{kacarson@arizona.edu}, Diego F. Leal \href{mailto:dflc@arizona.edu}{dflc@arizona.edu}
}
