Contributing
Scope & Principles
This repo (
RSGInc/hts_weighting) is the general toolkit. It should contain reusable functions (R/), generic runner scripts (scripts/), and documentation (Quartoworkbook). Avoid hard-coding project quirks.Project-specific code belongs in a project fork. Keep local recodes, one-off joins, and special rules in a fork (or a project branch in your org fork). This keeps the core clean and lets us evolve it safely.
Assume you’ll upstream useful changes. Even if you start in a fork, design your changes as if you’ll open a PR back to
RSGInc/hts_weighting. That mindset prevents unreconcilable drift if we need to “re-weight” later using updates in the core toolkit.Keep forks current with upstream. Regularly sync your fork’s
mainwithRSGInc/hts_weighting:mainto minimize merge conflicts and ensure you’re building on the latest stable base.Open draft PRs early. Don’t wait until your feature is “perfect” to open a PR. Early drafts help surface conflicts, get feedback, and validate your approach.
Keep changes small and focused. Large, sweeping changes are hard to review and merge. Break big features into smaller, manageable PRs.
Test everything. Add project-specific tests in your fork, and keep the core tests here passing. Use the shared test database (
hts_weighting_testing) to validate end-to-end runs.Should test data change? Almost always No Your results should match existing test outputs unless there’s a good reason. If your change causes test data to change (e.g., new expected outputs), this is an all-hands discussion and the proposed new outputs must be vetted and approved.
Preferred Workflow (Fork → Branch → PR)
We use an “Upstream-First” fork-based workflow. The diagram below illustrates how to manage feature branches, keep forks in sync with upstream changes, and handle multiple project forks.
Core Tenets:
- Keep feature branches small and focused.
flowchart LR
subgraph Main["*Main 'Upstream' Repo (RSGInc/hts_weighting)*"]
A("Main")
Merge1@{ shape: procs, label: "Squash &<br>Merge PR" }
Update(["Main Updated"])
Merge3@{ shape: procs, label: "Squash &<br>Merge PR" }
B(["Main Updated, again"])
end
subgraph Proj1["*Project Fork*"]
B1(["Create Feature Branch<br>feature/my-change"])
C1(["Develop &<br> Commit"])
PR1{{"Open PR"}}
%% D["Push Branch to Fork"]
Sync1("Sync Fork")@{shape: curv-trap}
Sync3("Sync Fork")@{shape: curv-trap}
end
subgraph Proj2["*Other Project Fork*"]
B2(["Create Feature Branch<br>feature/my-change"])
C2(["Develop &<br> Commit"])
Sync2@{shape: curv-trap, label: "Sync Fork"}
Merge2@{ shape: procs, label: "Merge &<br>Reconcile" }
C3(["Continue Developing &<br> Commit"])
PR2{{"Open PR"}}
Sync4("Sync Fork")@{shape: curv-trap}
end
A -->|Developer forks| Proj1
A -->|Developer forks| Proj2
B1 --> C1 --> PR1
PR1 -.->|Revise| Merge1
Merge1 -.->|Review| PR1
Merge1 --> Update
Update --> Sync1
%% Update --> Merge3
Merge3 --> B
B --> Sync3
B --> Sync4
B2 --> C2
C2 --> Sync2
Update --> Sync2
Sync2 --> Merge2
Merge2 --> C3
C3 --> PR2
PR2 -.->|Revise| Merge3
Merge3 -.->|Review| PR2
style Main fill:#007acc,stroke:#004f8a,color:#fff
style Proj1 fill:#b3e6ff,stroke:#004f8a,color:#004f8a
style Proj2 fill:#b3e6ff,stroke:#004f8a,color:#004f8a
style A fill:#b3e0ff,stroke:#004f8a
style Merge1 fill:#66cc99,stroke:#26734d,color:#fff
style Update fill:#99ffcc,stroke:#26734d
style B1 fill:#ffe699,stroke:#b38600
style C1 fill:#ffcc66,stroke:#b38600
style B2 fill:#ffe699,stroke:#b38600
style C2 fill:#ffcc66,stroke:#b38600
style PR1 fill:#ffd699,stroke:#b38600
style Sync1 fill:#cce6ff,stroke:#004f8a
style Sync2 fill:#cce6ff,stroke:#004f8a
Fork the repo (keep it in the
RSGIncorg).-
Clone your fork (naming convention:
RSGInc/hts_weighting-<client>_<year>): -
Create a development branch (don’t work on
main): Make and commit changes (small, focused commits; clear messages).
-
Keep your fork current (early and often):
git fetch upstream git checkout main git merge --ff-only upstream/main # or: git rebase upstream/main git push origin main # update your feature branch: git checkout feature/<short-slug> git rebase mainYou can also use GitHub’s “Sync fork” button. Before delivery, always sync with upstream
main. Open a draft PR to
RSGInc/hts_weighting:main. Draft early to surface merge conflicts, CI results, and design feedback. Use “Compare across forks” if needed.Iterate → request review → merge. When approved, squash/merge or rebase/merge per repo conventions.
What Belongs Where
-
Core repo (here)
- Generalizable functions, validations, and helpers
- Script improvements that work across projects (config-driven)
- Documentation, examples, QA/QC dashboards
Project fork
Coding Guidelines (R & Quarto)
R
- Prefer pure functions; accept
settingsand paths as arguments (avoid global state). - Validate inputs early; fail with clear, informative messages.
- Keep the public interface small; document parameters and return types using roxygen.
- Use
data.tablesyntax consistently.
Quarto
- Chapters must render in a clean session (no interactive assumptions).
- Use chunk options
cache: trueand document-levelfreeze: auto; avoid writing outsideworking_dir. - No absolute local paths; everything should resolve via
settings. - Keep callouts for “Settings used” and troubleshooting up to date.
Data, Secrets, and Configuration
Do not commit data or client secrets. Keep large or temporary artifacts in
working/(gitignored). Deliverables belong inoutputs/orreport/.Secrets (e.g., GitHub PAT) should be stored in user
~/.Renvironor repository Secrets (under GitHub Actions).Configs (
configs/<client>_<year>.yaml) should be minimal and documented. Prefer toggles over code forks.
Configuration and Schema Validation
1. Configuration-driven design
This repository is configuration-first: all project behavior, inputs, and output mappings are defined in YAML files under configs/.
Code should read from settings (via get_settings()) and never hard-code constants.
When adding a new toggle or parameter: 1. Add it to the appropriate YAML (e.g. configs/example.yaml). 2. Update the JSON schema (configs/settings.schema.json) with: - a title, description, and type, - allowed enum values if applicable, - defaults if appropriate. 3. Validate locally (see below) and commit both files together.
2. Schema validation before commits
You can validate YAMLs locally in two ways:
A. In Positron (recommended) - Install the YAML (Red Hat) extension. - Add to user/workspace settings: ```json “yaml.schemas”: { “./configs/settings.schema.json”: [“configs/*.yaml”] }
This enables instant validation, autocomplete, and hover help.
B. From R
{r} devtools::load_all() check_settings("configs/<your_config>.yaml")
This uses the internal schema validator to check type safety and key names.
3. When editing the schema
- Keep backward compatibility when possible; avoid breaking older project configs.
- Add a clear description and default for every new property.
- If deprecating keys, mark them with a “deprecated”: true comment and remove only in a major release.
- Document notable changes in NEWS.md under a “Configuration” subsection.
4. CI / Pull request validation
Schema consistency is enforced automatically in CI:
- All YAML files under configs/ are validated against configs/settings.schema.json.
- CI fails if any are invalid, missing defaults, or contain unrecognized keys.
Run it locally before pushing:
{r} check_all_settings('configs')
Keeping forks healthy (avoid drift)
git fetch upstream
git checkout main && git merge --ff-only upstream/main && git push origin main
git checkout feature/<short-slug> && git rebase main- Resolve conflicts in your feature branch, not during the final PR.
- If upstream behavior changes in a way that affects you, open a discussion or issue early.
Adding a New Project to the Test Database
Each project (for example, massdot_2024) gets its own schema inside the shared test database (hts_weighting_testing). The schema is populated with tables copied directly from the live POPS database.
1. Prepare the YAML Config
Create a new file under configs/examples/, for example:
configs/examples/myproject_2025.yaml
It should define the schema, database, table mappings, and file paths:
dbname: "hts_weighting_testing"
schema: "myproject_hts_2025"
working_dir: "working"
outputs_dir: "outputs"
report_dir: "reports"
hts_table_map:
household: "household"
person: "person"
day: "day"
trip: "trip"
value_labels: "value_labels"
sample_plan: "sample_plan"Then create a smaller test configuration, e.g.:
configs/examples/myproject_2025_dow.yaml
This may override paths or parameters for a lightweight test run.
2. Copy Source Data into the Test Database
Edit and run:
# inst/copy_db_to_test.RInside, set:
target_db = "hts_weighting_testing"
target_schema = "myproject_hts_2025"
source_db = "myproject"
source_schema = "combined"
settings = get_test_settings("myproject_2025_dow.yaml")Then run:
source("inst/copy_db_to_test.R")This will:
- Create schema
myproject_hts_2025inhts_weighting_testing - Copy all tables listed in
settings$hts_table_map - Use
pg_dump/pg_restoreper table - Drop the temporary staging schema afterward
Running Tests
There are two kinds of tests in this repo: * Unit tests for individual functions (in tests/testthat/) * End-to-end tests that run the full weighting scripts for a project (also in tests/testthat/)
These get run automatically in GitHub Actions CI, but you can also run them locally to test and debug.
When to Create a New test?
In general, always, but that’s often a luxury. It’s generally better to add a unit test to cover the specific function you’re changing. But end-to-end tests are also important to ensure the whole pipeline works. However, end-to-end tests are slow and difficult to maintain, so we try to limit them to key configurations.
The current end-to-end tests cover fundamental configurations for existing projects. - With/without linked trips (SWW and MetCouncil) - Custom weighting geographies (i.e., “client zones”) (MetCouncil) - Day of week weighting (MassDOT) - Person-level weighting (NYCDOT)
So as a general rule, end to end tests help cover either structurally different configurations (client zones or multiple States) or approaches (e.g., DOW or person-level weighting).
4. Create a Project-Specific Test File
Add to tests/testthat/, e.g.:
tests/testthat/test_myproject_2025.R
Example:
# testthat::test_dir("tests/testthat", filter = "myproject_2025$")
# Sometimes it can be useful to have the test prepared but skipped in CI runs
testthat::skip("Skipping standard MyProject weighting test")
settings = get_test_settings("myproject_2025.yaml")
test_state = new.env()
test_state$script_test_passed = FALSE
testthat::test_that("Testing scripts for myproject_2025", {
script_path = file.path(settings$code_root, '000_run_weight_scripts.R')
testthat::expect_true(file.exists(script_path))
testthat::expect_no_error(
tryCatch({
source(script_path, local = TRUE)
test_state$script_test_passed = TRUE
}, error = function(e) {
testthat::fail(paste("Error while sourcing script:", e$message))
})
)
})
testthat::test_that("Testing results for myproject_2025", {
testthat::skip_if_not(test_state$script_test_passed, "Script run failed, skipping result tests.")
test_results(settings)
})5. Run the Tests
Run all tests:
devtools::test()Or only your project’s tests:
testthat::test_dir("tests/testthat", filter = "myproject_2025$")6. Inspect Output
- Results are written to
outputs/andreports/ - Check
reports/*_check_counts.csvfor intermediate summaries - If
000_run_weight_scripts.Rfails, review its console output
7. Clean Up (Optional)
To reset the schema, set:
purge_target_schema = TRUEin inst/copy_db_to_test.R before re-running.
Summary
| Step | Action | File / Command |
|---|---|---|
| 1 | Create settings YAMLs | configs/examples/myproject_2025.yaml |
| 2 | Copy DB tables to test DB | source("inst/copy_db_to_test.R") |
| 3 | Verify schema contents | \dt myproject_hts_2025.* |
| 4 | Add test file | tests/testthat/test_myproject_2025.R |
| 5 | Run tests | devtools::test() |
| 6 | Inspect outputs |
reports/, outputs/
|
| 7 | Reset schema if needed | purge_target_schema = TRUE |
Automated Checks (GitHub Actions CI)
All pushes and pull requests to main automatically trigger continuous integration (CI) via .github/workflows/code_check.yaml. This ensures your code is linted, tested, and passes R CMD check before merge.
Workflow Overview
| Job | Purpose | Runner |
|---|---|---|
| linting | Runs lintr::lint_package() for consistent R style. |
Ubuntu |
| discover-and-setup | Finds all test files and prepares a job matrix for parallel testing. | Self-hosted |
| Tests | Runs testthat tests per file, in parallel, using the test database. |
Self-hosted |
| R-CMD-check | Runs R CMD check to verify package build integrity. |
Ubuntu |
How the CI Test Suite Works
Test Discovery All
tests/testthat/test-*.Rfiles are detected and distributed across runners for parallel execution.Database Setup Each test file references a settings YAML (e.g.
massdot_2024.yaml) that points to the test database (hts_weighting_testing). Schemas (e.g.massdot_hts_2024) must already exist — typically created withinst/copy_db_to_test.R.-
Secrets and Credentials Required GitHub Secrets (set under Settings → Secrets and variables → Actions):
-
PAT(GitHub token withrepoandread:packagesscope) POPS_USER-
POPS_PASSWORDThese are securely masked in logs.
-
Python Environment Setup Installs dependencies via
uv syncand validatespopulationsim. ExposesPYTHON_VENV_PATHfor R integration.R Environment Setup Uses
r-lib/actions/setup-r@v2andsetup-renv@v2to install R 4.4.3 and restore dependencies. Installs geospatial system libraries (libproj-dev,libpq-dev, etc.).-
Running Tests Each test file runs individually:
testthat::with_reporter(testthat::JunitReporter$new(), { testthat::test_file(testfile) })Failures stop the job (
testthat.stop_on_failure = TRUE). Artifacts and Logs JUnit XML logs are created per test and shown in GitHub’s “Checks” tab. Output files (
reports/,outputs/) are not committed but may be inspected locally.-
R CMD Check Validation Finally, the workflow runs:
This ensures package metadata, dependencies, and documentation are valid.
Before Pushing or Opening a PR
Run the same checks locally:
lintr::lint_package()
devtools::test()
devtools::check()You can filter tests by project:
testthat::test_dir("tests/testthat", filter = "massdot_2024$")If all checks pass locally, your CI pipeline should succeed once you push.