Skip to contents

Audience: analysts who want to run, explore, or extend the HTS weighting workflow. Purpose: Explain how to use the repo day-to-day — running it end-to-end, stepping into specific chapters, using the package functions, and managing caches. —

What is This Repository?

This repository contains three tightly connected things:

  1. An R package (hts.weighting) — a library of helper functions that the book uses internally and that you can also call interactively.

  2. A YAML configuration system (configs/) — a set of example configuration files that define inputs, outputs, and parameters for different regions and scenarios. You can create your own by copying and modifying these.

  3. A runnable workflow — the Quarto book (manual/) executes the full weighting process, step by step, from configuration through final weights.

The Quarto chapters are the “scripts.” They read your YAML configuration (see configs/) process data, and write both intermediate data caches and final results to a structured folder.

Running the Workflow End-to-End

Once your environment and configuration file are set up (see INSTALL.md and CONFIGURATION.qmd), you can recreate all weights by rendering the entire manual:

quarto render manual

This runs every chapter in order, using cached results when possible. The final HTML lives at manual/_book/index.html and the data reside in the project directories under the WEIGHTING_DATA_PATH specified in your .Renviron(default: test_cache/).

If you only want to rerun a single stage (for example, data cleaning or initial expansion):

quarto render manual/040_initial_expansion.qmd

Each chapter is self-contained: it reads inputs from the cache, runs its step, and saves new outputs back to the cache; however, note that later chapters depend on earlier ones, so you may need to run prior steps first – or update subsequent chapters if you change earlier outputs. Future versions may include more granular dependency tracking, e.g., with makepipe or targets.

Some code chunks hide code, messages, or warnings for cleaner output. To view all code and outputs, open the .qmd files and set echo: true, messages: true, and warnings: true in chunk options. See Quarto Execution Options for details.

Exploring and Modifying Quarto “Scripts”

You can open any .qmd in your IDE (Positron, VSCode, RStudio) and run it interactively, line by line.

This is the best way to explore logic, test small changes, or inspect objects mid-run.

Typical workflow:

  1. Open the chapter you want (e.g., manual/040_initial_expansion.qmd).

  2. Make minor edits or insert View() / print() statements.

  3. Run code cells interactively using your IDE’s “Run Cell” or “Run Line” tools.

  4. Unpack hts.weighting functions by navigating to their definitions (in the R/ folder) and running them line-by-line as needed, or with debug()

  5. When satisfied, re-render the chapter to update its cached outputs.

Exploring and Modifying the Package Functions

You can call the same functions used in the quarto “scripts” directly from an R session:

devtools::load_all()   # load the package from source
settings <- get_settings()  # reads your YAML config, sets up paths

From there, you can inspect, modify, or reuse the intermediate data created in the caches.

For example:

# load an intermediate dataset
hh <- readRDS(file.path(settings$working_dir, "household_clean.rds"))

# experiment with a package function interactively
debug(calc_initial_weights)
calc_initial_weights(hh, settings)
undebug(calc_initial_weights)

Edits to functions under R/ take effect immediately after running (or re-running) devtools::load_all() — no reinstall needed.

Understanding and Managing Caches

The weighting project maintains two layers of caching:

Quarto Execution Caches

  • Stored in _freeze/ inside the manual/ directory.
  • Controls whether Quarto re-runs code chunks.
  • Each chapter’s cache is automatically invalidated when its code or inputs change.
  • You can safely delete manual/_freeze/ to force a clean rebuild of the manual.
  • Or turn off caching entirely by adding cache: false to the chapter or book YAML header (see: manual/_quarto.yml). This is the safest option when testing code.

Data Caches in the Project Root (the “Weighting Cache”)

Located under your working data path, typically test_cache/ or a directory defined in your .Renviron settings.

A typical structure:

test_cache/
  input/       # raw survey + control data
  working/     # intermediate outputs per stage
  output/      # final household/person/day weights
  report/      # diagnostics, summaries, plots

Each chapter reads from and writes to these subfolders using paths defined in settings.

Cache management tips:

  • To rerun one stage cleanly: delete that stage’s files in working/ and report/, then re-render its .qmd.
  • To reset the entire project: delete all subfolders inside test_cache/. (Sometimes: delete all except input/ to keep raw data.)
  • To test edits without overwriting production outputs: point to a different cache root in your YAML (e.g., test_cache_dev/).
  • To inspect results: you can open any intermediate RDS or CSV file in working/ or report/.

Because the Quarto book uses the same cache directories across chapters, outputs cascade automatically: one chapter’s outputs become the next chapter’s inputs.

Debugging

To step through a function while running a chapter or in the console:

debug(calc_initial_weights)

In Positron and VSCode, execution will pause inside the function, letting you inspect variables with n, c, and Q.

To turn debugging off afterward:

undebug(calc_initial_weights)

7) Switching Configurations

To switch to another project or scenario, edit your .Renviron so that it points to the correct YAML configuration file. For example, to switch to the psrc_2025.yaml configuration, set:

WEIGHTING_SETTINGS_PATH=new_config.yaml
WEIGHTING_DATA_PATH=new_cache_directory

Then restart your R session to pick up the new settings.