Record checksums of arbitrary data
record_checksum.RdRecord checksums of arbitrary data to a file for reproducibility and auditing. Use to log summary statistics or checksums at key pipeline steps, enabling validation and traceability.
Details
Used throughout control data cleaning, tabulation, and expansion scripts to log summary statistics after each major transformation.
Typical usage: after cleaning, splitting, adjusting, or aggregating data, call
record_checksum()to append a summary row to a CSV log.Accepts arbitrary summary tables, including those from
pums_checksum()or custom data.table calls.Enables reproducible reporting and facilitates debugging by tracking row counts, weight sums, and other metrics at each step.
Returns the summary data.table for further use in analysis or reporting.
Assumes valid file path and data; errors if missing.
Usage in Scripts
scripts/020_control_data_cleaning.R: logs checksums after each cleaning and adjustment step
scripts/022_control_data_tabulation.R: records checksums after splitting and tabulating data
scripts/023_control_data_sum_by_zones.R: appends checksums after aggregation and zone adjustment
scripts/040_initial_expansion.R: tracks checksums after expansion and weighting
See also
pums_checksumscripts/io/record_checksum.R
Other io utilities:
clean_data_dict(),
cut_and_label(),
db_table_has_geometry(),
fetch_acs(),
fetch_all_hts_tables(),
fetch_from_db(),
fetch_hts_table(),
fetch_pums(),
fetch_study_region(),
find_project_root(),
fix_value_labels_on_load(),
get_db_table_name(),
print_params(),
pums_checksum(),
read_from_db(),
read_pums_codebook(),
sampled_latlon_to_bg()
Examples
## Not run:
# After cleaning PUMS data:
check_dt <- record_checksum(
fname = "020_check_counts.csv",
append = FALSE,
pums_checksum("initial", pums_0, "person")
)
#> Error: object 'pums_0' not found
# After splitting or adjusting data:
check_dt <- record_checksum(
fname = "020_check_counts.csv",
append = TRUE,
data.table(dataset = "PUMS", step = "adjusted", n_rows = 1000)
)
#> Appended checksums to 020_check_counts.csv
## End(Not run)