Skip to contents

Record checksums of arbitrary data to a file for reproducibility and auditing. Use to log summary statistics or checksums at key pipeline steps, enabling validation and traceability.

Usage

record_checksum(fname, append, ...)

Arguments

fname

character(1). File to write checksums to (e.g., "020_check_counts.csv").

append

logical. If TRUE, appends to file; else overwrites. Default FALSE.

...

keyword arguments passed to data.table (typically summary metrics).

Value

data.table. Summary checksums written to file.

Details

  • Used throughout control data cleaning, tabulation, and expansion scripts to log summary statistics after each major transformation.

  • Typical usage: after cleaning, splitting, adjusting, or aggregating data, call record_checksum() to append a summary row to a CSV log.

  • Accepts arbitrary summary tables, including those from pums_checksum() or custom data.table calls.

  • Enables reproducible reporting and facilitates debugging by tracking row counts, weight sums, and other metrics at each step.

  • Returns the summary data.table for further use in analysis or reporting.

  • Assumes valid file path and data; errors if missing.

Usage in Scripts

  • scripts/020_control_data_cleaning.R: logs checksums after each cleaning and adjustment step

  • scripts/022_control_data_tabulation.R: records checksums after splitting and tabulating data

  • scripts/023_control_data_sum_by_zones.R: appends checksums after aggregation and zone adjustment

  • scripts/040_initial_expansion.R: tracks checksums after expansion and weighting

Settings

None.

Examples

## Not run:
# After cleaning PUMS data:
check_dt <- record_checksum(
  fname = "020_check_counts.csv",
  append = FALSE,
  pums_checksum("initial", pums_0, "person")
)
#> Error: object 'pums_0' not found
# After splitting or adjusting data:
check_dt <- record_checksum(
  fname = "020_check_counts.csv",
  append = TRUE,
  data.table(dataset = "PUMS", step = "adjusted", n_rows = 1000)
)
#> Appended checksums to 020_check_counts.csv
## End(Not run)