Saving and Sharing Graphs with the Caugi Format

library(caugi)

Overview

The caugi package provides a native JSON-based serialization format for saving and loading causal graphs. This format enables reproducible research, data sharing, and caching of graph structures.

Quick Start

Writing Graphs

First, create a causal graph:

cg <- caugi(
  A %-->% B + C,
  B %-->% D,
  C %-->% D,
  class = "DAG"
)

Then, write it to a file in the caugi format:

tmp <- tempfile(fileext = ".caugi.json")
write_caugi(cg, tmp,
  comment = "Example causal graph",
  tags = c("research", "example")
)

That’s it! The graph is now saved in a human-readable JSON file.

Reading Graphs

You can read the graph back from the file, and verify it matches the original:

cg_loaded <- read_caugi(tmp)

identical(edges(cg), edges(cg_loaded))
#> [1] TRUE

The Caugi Format

Structure

The caugi format uses a simple, human-readable JSON structure:

{
  "$schema": "https://caugi.org/schemas/caugi-v1.schema.json",
  "format": "caugi",
  "version": "1.0.0",
  "graph": {
    "class": "DAG",
    "nodes": [
      "A",
      "B",
      "C",
      "D"
    ],
    "edges": [
      {
        "from": "A",
        "to": "B",
        "edge": "-->"
      },
      {
        "from": "A",
        "to": "C",
        "edge": "-->"
      },
      {
        "from": "B",
        "to": "D",
        "edge": "-->"
      },
      {
        "from": "C",
        "to": "D",
        "edge": "-->"
      }
    ]
  },
  "meta": {
    "comment": "Example causal graph",
    "tags": [
      "research",
      "example"
    ]
  }
}

Key Features

Edge Types

The format supports all caugi edge types using their DSL operators:

Operator Description Graph Types
--> Directed edge DAG, PDAG, ADMG, UNKNOWN
--- Undirected edge UG, PDAG, UNKNOWN
<-> Bidirected edge ADMG, UNKNOWN
o-> Partially directed PDAG, UNKNOWN
--o Partially undirected PDAG, UNKNOWN
o-o Partial (both circles) PDAG, UNKNOWN

Working with the Format

String Serialization

For programmatic use, you can serialize to/from strings:

# Serialize to JSON string
json_str <- caugi_serialize(cg)
cat(substr(json_str, 1, 200), "...\n")
#> {
#>   "$schema": "https://caugi.org/schemas/caugi-v1.schema.json",
#>   "format": "caugi",
#>   "version": "1.0.0",
#>   "graph": {
#>     "class": "DAG",
#>     "nodes": [
#>       "A",
#>       "B",
#>       "C",
#>       "D"
#>   ...

# Deserialize from JSON string
cg_from_json <- caugi_deserialize(json_str)

Lazy Loading

For large graphs, you can defer building:

# Read without building the Rust graph structure
cg_lazy <- read_caugi(tmp, lazy = TRUE)

# Build when needed
cg_lazy <- build(cg_lazy)

Metadata

Add context to your graphs with comments and tags:

write_caugi(cg, tmp,
  comment = "Mediation model from Study A",
  tags = c("mediation", "study-a", "validated")
)

Different Graph Types

The format supports all caugi graph classes:

# DAG
dag <- caugi(X %-->% Y, Y %-->% Z, class = "DAG")

# PDAG (with undirected edges)
pdag <- caugi(X %-->% Y, Y %---% Z, class = "PDAG")

# ADMG (with bidirected edges)
admg <- caugi(X %-->% Y, Y %<->% Z, class = "ADMG")

# UG (undirected graph)
ug <- caugi(X %---% Y, Y %---% Z, class = "UG")

# Save them all
write_caugi(dag, tempfile(fileext = ".caugi.json"))
write_caugi(pdag, tempfile(fileext = ".caugi.json"))
write_caugi(admg, tempfile(fileext = ".caugi.json"))
write_caugi(ug, tempfile(fileext = ".caugi.json"))

File Extension Convention

We recommend using .caugi.json as the file extension to clearly indicate both the format and content type. This helps tools recognize the files and enables automatic handling by IDEs and validators.

Schema Validation

All files generated by write_caugi() include a $schema field pointing to the formal JSON Schema specification:

https://caugi.org/schemas/caugi-v1.schema.json

This enables:

Performance

Serialization is implemented in Rust for high performance. Large graphs serialize and deserialize efficiently:

tmp_file <- tempfile(fileext = ".caugi.json")
large_dag <- generate_graph(n = 1000, m = 500, class = "DAG")
system.time(write_caugi(large_dag, tmp_file))
system.time(res <- read_caugi(tmp_file))
unlink(tmp_file)