Raw variable names like trt, marker, and
grade don’t belong in a publication table. If you’re
building 20+ tables across an analysis, manually relabeling the same
variables in every tbl_summary() call is time consuming.
add_auto_labels() lets you define labels once and apply
them everywhere.
A dictionary is a data frame with two columns: variable
(exact variable names) and description (the labels you want
displayed). Column names are case-insensitive.
dictionary <- tibble::tribble(
~variable, ~description,
"trt", "Chemotherapy Treatment",
"age", "Age at Enrollment (years)",
"marker", "Marker Level (ng/mL)",
"stage", "T Stage",
"grade", "Tumor Grade",
"response", "Tumor Response",
"death", "Patient Died"
)
dictionary
#> # A tibble: 7 × 2
#> variable description
#> <chr> <chr>
#> 1 trt Chemotherapy Treatment
#> 2 age Age at Enrollment (years)
#> 3 marker Marker Level (ng/mL)
#> 4 stage T Stage
#> 5 grade Tumor Grade
#> 6 response Tumor Response
#> 7 death Patient DiedIn practice, you could load this from a CSV or define it once at the top of your analysis script.
trial |>
tbl_summary(by = trt, include = c(age, grade, marker)) |>
extras() |>
add_auto_labels(dictionary = dictionary)| Overall N = 2001 |
Drug A N = 981 |
Drug B N = 1021 |
p-value2 | |
|---|---|---|---|---|
| Age | 47 (38, 57) | 46 (37, 60) | 48 (39, 56) | 0.718 |
| Unknown | 11 | 7 | 4 | |
| Grade | 0.871 | |||
| I | 68 (34%) | 35 (36%) | 33 (32%) | |
| II | 68 (34%) | 32 (33%) | 36 (35%) | |
| III | 64 (32%) | 31 (32%) | 33 (32%) | |
| Marker Level (ng/mL) | 0.64 (0.22, 1.41) | 0.84 (0.23, 1.60) | 0.52 (0.18, 1.21) | 0.085 |
| Unknown | 10 | 6 | 4 | |
| 1 Median (Q1, Q3); n (%) | ||||
| 2 Wilcoxon rank sum test; Pearson’s Chi-squared test | ||||
If a dictionary object exists in your environment,
add_auto_labels() finds it without you passing it:
# dictionary already exists from above
trial |>
tbl_summary(by = trt, include = c(age, stage, response)) |>
extras() |>
add_auto_labels()| Overall N = 2001 |
Drug A N = 981 |
Drug B N = 1021 |
p-value2 | |
|---|---|---|---|---|
| Age | 47 (38, 57) | 46 (37, 60) | 48 (39, 56) | 0.718 |
| Unknown | 11 | 7 | 4 | |
| T Stage | 0.866 | |||
| T1 | 53 (27%) | 28 (29%) | 25 (25%) | |
| T2 | 54 (27%) | 25 (26%) | 29 (28%) | |
| T3 | 43 (22%) | 22 (22%) | 21 (21%) | |
| T4 | 50 (25%) | 23 (23%) | 27 (26%) | |
| Tumor Response | 61 (32%) | 28 (29%) | 33 (34%) | 0.530 |
| Unknown | 7 | 3 | 4 | |
| 1 Median (Q1, Q3); n (%) | ||||
| 2 Wilcoxon rank sum test; Pearson’s Chi-squared test | ||||
If your data already has label attributes (e.g., from
haven::read_sas() or manual assignment),
add_auto_labels() reads those directly:
labeled_trial <- trial
attr(labeled_trial$age, "label") <- "Patient Age at Baseline"
attr(labeled_trial$marker, "label") <- "Biomarker Concentration (ng/mL)"
labeled_trial |>
tbl_summary(by = trt, include = c(age, marker)) |>
extras() |>
add_auto_labels()| Overall N = 2001 |
Drug A N = 981 |
Drug B N = 1021 |
p-value2 | |
|---|---|---|---|---|
| Patient Age at Baseline | 47 (38, 57) | 46 (37, 60) | 48 (39, 56) | 0.718 |
| Unknown | 11 | 7 | 4 | |
| Biomarker Concentration (ng/mL) | 0.64 (0.22, 1.41) | 0.84 (0.23, 1.60) | 0.52 (0.18, 1.21) | 0.085 |
| Unknown | 10 | 6 | 4 | |
| 1 Median (Q1, Q3) | ||||
| 2 Wilcoxon rank sum test | ||||
Labels set via label = list(...) in
tbl_summary() always take priority over dictionary or
attribute labels:
trial |>
tbl_summary(
by = trt,
include = c(age, grade, marker),
label = list(age ~ "Age (from tbl_summary function)")
) |>
extras() |>
add_auto_labels(dictionary = dictionary)| Overall N = 2001 |
Drug A N = 981 |
Drug B N = 1021 |
p-value2 | |
|---|---|---|---|---|
| Age (from tbl_summary function) | 47 (38, 57) | 46 (37, 60) | 48 (39, 56) | 0.718 |
| Unknown | 11 | 7 | 4 | |
| Grade | 0.871 | |||
| I | 68 (34%) | 35 (36%) | 33 (32%) | |
| II | 68 (34%) | 32 (33%) | 36 (35%) | |
| III | 64 (32%) | 31 (32%) | 33 (32%) | |
| Marker Level (ng/mL) | 0.64 (0.22, 1.41) | 0.84 (0.23, 1.60) | 0.52 (0.18, 1.21) | 0.085 |
| Unknown | 10 | 6 | 4 | |
| 1 Median (Q1, Q3); n (%) | ||||
| 2 Wilcoxon rank sum test; Pearson’s Chi-squared test | ||||
Works with tbl_regression() the same way:
| Characteristic | Beta | 95% CI | p-value |
|---|---|---|---|
| Age at Enrollment (years) | 0.00 | -0.01, 0.01 | >0.9 |
| Tumor Grade | |||
| I | — | — | |
| II | -0.35 | -0.67, -0.04 | 0.027 |
| III | -0.12 | -0.43, 0.19 | 0.4 |
| T Stage | |||
| T1 | — | — | |
| T2 | 0.33 | -0.01, 0.67 | 0.057 |
| T3 | 0.21 | -0.17, 0.58 | 0.3 |
| T4 | 0.14 | -0.22, 0.50 | 0.4 |
| Abbreviation: CI = Confidence Interval | |||
When both dictionary labels and attribute labels exist for the same variable, attribute labels take priority by default:
label = list(...)
in tbl_summary()) always winattr(data$var, "label")) take priority over dictionaryWe recommend setting
options(sumExtras.prefer_dictionary = TRUE) so dictionary
labels take priority over attribute labels. This is especially useful
when your imported data has generic attribute labels but your dictionary
has the labels you actually want in publication tables. See
vignette("options") for details.
trial_both <- trial
attr(trial_both$age, "label") <- "Age from Attribute"
dictionary_conflict <- tibble::tribble(
~variable, ~description,
"age", "Age from Dictionary"
)
# Attribute wins over dictionary
trial_both |>
tbl_summary(by = trt, include = age) |>
add_auto_labels(dictionary = dictionary_conflict) |>
extras()| Overall N = 2001 |
Drug A N = 981 |
Drug B N = 1021 |
p-value2 | |
|---|---|---|---|---|
| Age from Attribute | 47 (38, 57) | 46 (37, 60) | 48 (39, 56) | 0.718 |
| Unknown | 11 | 7 | 4 | |
| 1 Median (Q1, Q3) | ||||
| 2 Wilcoxon rank sum test | ||||
If you always keep a dictionary in your environment, you
can skip calling add_auto_labels() entirely. Set this once
per session (or put it in your .Rprofile):
Now every extras() call picks up the dictionary
automatically:
dictionary <- tibble::tribble(
~variable, ~description,
"age", "Age at Enrollment (years)",
"marker", "Marker Level (ng/mL)",
"grade", "Tumor Grade"
)
# No add_auto_labels() needed
trial |>
tbl_summary(by = trt) |>
extras()If no dictionary is found and the data has no label attributes,
extras() continues normally. If something goes wrong, it
warns and moves on. You can still call add_auto_labels()
explicitly whenever you need per-table control.
See vignette("options") for more on
.Rprofile setup.
vignette("sumExtras-intro") – getting started with
extras()vignette("styling") – group headers and advanced
formattingvignette("themes") – JAMA compact themes for
{gtsummary} and {gt}