% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/generate_new_ratings.R
\name{generate_typicality}
\alias{generate_typicality}
\title{Generate typicality ratings via an 'Inference Provider' (experimental)}
\usage{
generate_typicality(
  groups,
  descriptions,
  api_url,
  api_token,
  model = "meta-llama/Llama-3.3-70B-Instruct-Turbo",
  n = 25,
  min_valid = ceiling(0.8 * n),
  temperature = 1,
  top_p = 1,
  max_tokens = 3,
  retries = 4,
  matrix = TRUE,
  return_raw_scores = TRUE,
  return_full_responses = FALSE,
  verbose = interactive(),
  system_prompt = default_system_prompt(),
  user_prompt_template = default_user_prompt_template()
)
}
\arguments{
\item{groups, descriptions}{Character vectors. \emph{When} \code{matrix = FALSE} they
\strong{must} be the same length.}

\item{api_url}{Fully-qualified HTTPS URL for the provider's chat completions endpoint (e.g., "https://api.together.xyz/v1/chat/completions").}

\item{api_token}{API token for the inference provider.}

\item{model}{Model identifier string to be passed in the API request body. Check your provider's documentation for the available models and correct names.}

\item{n}{Samples requested per retry block (>= 1).}

\item{min_valid}{Minimum numeric scores required per pair (>= 1).}

\item{temperature, top_p, max_tokens}{Generation controls.}

\item{retries}{Maximum number of \emph{additional} retry blocks.}

\item{matrix}{\code{TRUE} = cross-product, \code{FALSE} = paired.}

\item{return_raw_scores}{If \code{TRUE}, also returns the vector(s) of raw valid numeric scores.}

\item{return_full_responses}{If \code{TRUE}, also returns all raw text model outputs
(or error strings from failed attempts) for each query.}

\item{verbose}{If \code{TRUE}, prints progress: pair labels, retry counts,
running tallies, and raw model responses/errors as they occur.}

\item{system_prompt}{Prompt string for the system message. See the 'Prompting Details' section and function signature for default content and customization.}

\item{user_prompt_template}{Prompt template for the user message, with \code{{group}} and \code{{description}} placeholders. No additional formatting is added by the function. See the 'Prompting Details' section and function signature for default content and customization.}
}
\value{
If a pair cannot reach min_valid, its mean is NA; raw invalid strings remain available when return_full_responses = TRUE.
Cross-product mode (\code{matrix = TRUE}) -> a list containing:
\itemize{
\item \code{scores}: A matrix of mean typicality scores.
\item \code{raw} (if \code{return_raw_scores = TRUE}): A matrix of lists, where each list contains the raw numeric scores for that pair.
\item \code{full_responses} (if \code{return_full_responses = TRUE}): A matrix of lists, where each list contains all raw text model outputs (or error strings) for that pair.
}
Paired mode (\code{matrix = FALSE}) -> a tibble with columns for \code{group}, \code{description}, \code{mean_score}, and additionally:
\itemize{
\item \code{raw} (if \code{return_raw_scores = TRUE}): A list-column where each element is a vector of raw numeric scores.
\item \code{full_responses} (if \code{return_full_responses = TRUE}): A list-column where each element is a character vector of all raw text model outputs (or error strings).
}
}
\description{
This function uses a compatible 'Inference Provider' API (e.g., 'Together AI' or 'Fireworks')
to generate typicality ratings by querying a large language model (LLM).
It generates one or multiple ratings for each group-description pair and returns the mean score.
It can be quite slow to run depending on the API.

\strong{Important:} Before running this function, please ensure that:
\itemize{
\item You have a valid API token from your inference provider (via \code{api_token} or an environment variable);
\item You have provided the correct and complete URL for the provider's chat completions endpoint;
\item The specified model is available and accessible via the endpoint;
\item The model supports the standard \code{messages} array format (with system/user roles) and generates numeric outputs in response to the prompts.
}

Calls to the API are rate-limited, may incur usage costs, and require an internet connection.
This feature is \strong{experimental} and is not guaranteed to work with all models or providers.
}
\section{Get Typicality Ratings from Large Language Models}{
\strong{generate_typicality()} sends structured prompts to any text-generation model
served via an compatible API endpoint and collects \emph{numeric} ratings (0-100)
of how well a \emph{description} (e.g., an adjective) fits a \emph{group} (e.g., an
occupation). Responses that cannot be parsed into numbers are discarded.
\subsection{Modes}{
\itemize{
\item \strong{Cross-product} (\code{matrix = TRUE}, \emph{default})    Rate every combination of
the \emph{unique} \code{groups} and \code{descriptions}. Returns a list containing matrices.
\item \strong{Paired} (\code{matrix = FALSE})                     Rate the pairs row-by-row
(\code{length(groups) == length(descriptions)}). Returns a tibble.
}

Each pair is queried repeatedly until at least \strong{\code{min_valid}} clean scores
are obtained or the retry budget is exhausted. One \emph{retry block} consists of
\strong{\code{n}} new samples; invalid or out-of-range answers are silently dropped.
}
}

\section{Prompting Details}{

The function constructs a \code{messages} array for the API request.
The \code{system_prompt} becomes the content of the \code{system} role message, and the
rendered \code{user_prompt_template} (where \code{{group}} and \code{{description}}
are substituted with the actual values) becomes the content of the \code{user} role message.

The default \code{system_prompt} is:

\if{html}{\out{<div class="sourceCode">}}\preformatted{You are expert at accurately reproducing the stereotypical associations
humans make, in order to annotate data for experiments.
Your focus is to capture common societal perceptions and stereotypes,
rather than factual attributes of the groups,
even when they are negative or unfounded.
}\if{html}{\out{</div>}}

The default \code{user_prompt_template} is:

\if{html}{\out{<div class="sourceCode">}}\preformatted{Rate how well the description "\{description\}" reflects the prototypical
member of the group "\{group\}" on a scale from 0 ("Not at all") to 100
("Extremely").

To clarify, consider the following examples:
1. "Rate how well the description "FUNNY" reflects the prototypical member
   of the group "CLOWN" on a scale from 0 (Not at all) to 100 (Extremely)."
   A high rating is expected because "FUNNY" closely aligns with typical
   characteristics of a "CLOWN".
2. "Rate how well the description "FEARFUL" reflects the prototypical member
   of the group "FIREFIGHTER" on a scale from 0 (Not at all) to 100
   (Extremely)." A low rating is expected because "FEARFUL" diverges from
   typical characteristics of a "FIREFIGHTER".
3. "Rate how well the description "PATIENT" reflects the prototypical member
   of the group "ENGINEER" on a scale from 0 (Not at all) to 100
   (Extremely)." A mid-scale rating is expected because "PATIENT" neither
   strongly aligns with nor diverges from typical characteristics of an
   "ENGINEER".

Your response should be a single score between 0 and 100, with no additional
text, letters, or symbols.
}\if{html}{\out{</div>}}

Rate-limit friendliness: transient HTTP 429/5xx errors are retried
(exponential back-off).
}

\examples{
\dontrun{

Sys.setenv(PROVIDER_API_URL = "https://api.together.xyz/v1/chat/completions")
Sys.setenv(PROVIDER_API_TOKEN = "your_secret_token_here")

toy_groups <- c("engineer", "clown", "firefighter") # Minimal example
toy_descriptions <- c("patient", "funny", "fearful")

toy_result <- generate_typicality(
  groups = toy_groups,
  descriptions = toy_descriptions,
  api_url = Sys.getenv("PROVIDER_API_URL"),
  api_token = Sys.getenv("PROVIDER_API_TOKEN"),
  model = "meta-llama/Llama-3.3-70B-Instruct-Turbo",
  n = 10,
  min_valid = 8,
  matrix = FALSE,
  return_raw_scores = TRUE,
  return_full_responses = FALSE,
  verbose = TRUE
)

print(toy_result)
}

\dontrun{

ratings <- download_data("validation_ratings") # Full-scale example

new_scores <- generate_typicality(
  groups                = ratings$group,
  descriptions          = ratings$adjective,
  api_url               = Sys.getenv("PROVIDER_API_URL"),
  api_token             = Sys.getenv("PROVIDER_API_TOKEN"),
  model                 = "meta-llama/Llama-3.3-70B-Instruct-Turbo",
  n                     = 25,
  min_valid             = 20,
  max_tokens            = 5,
  retries               = 1,
  matrix                = FALSE,
  return_raw_scores     = TRUE,
  return_full_responses = TRUE,
  verbose               = TRUE
)

head(new_scores)
}
}
