2  Installation & Setup

Installation Guide for hts_weighting

2.1 At a Glance

Purpose: Guide users through installing all necessary software and configuring their machine to run the hts_weighting project.

Audience: anyone who needs to run the hts_weighting project locally.

Time to Complete:** ~30-60 minutes, depending on prior experience and what’s already installed.

Key Components: You’ll install R, Python, and optionally an IDE before configuring project-specific settings. - R 4.4.3 with renv package management - Python 3.12.x with PopulationSim via uv environment manager - Quarto for rendering documentation - Plus some extras if you want them!


2.2 Prerequisites

Git & Repository Setup

  1. Install Git from https://git-scm.com/downloads if you haven’t already.

  2. Next, clone this repository to your local machine. Do this with your preferred Git client, or run in terminal:

git clone https://github.com/RSGInc/hts_weighting-psrc_2025.git
  1. RSG Users: If you’ll be doing a lot of weighting, you might want to place all your clones of hts_weighting in a common parent directory, e.g., C:/Users/your.name/Documents/locals/hts_weighting_forks/. This makes it easier to switch between different versions or forks of the project, and avoids clutter.

R Environment Setup

We pin to R 4.4.3 for reproducibility across machines and CI. This is the “latest stable version” of R before the 4.5.x series. You will also need to install build tools (Rtools) for your OS to compile packages from source.

Windows:

  1. Install R 4.4.3 (default C:\Program Files\R\R-4.4.3).
  2. Install Rtools44 at this install page: Rtools44 for Windows.
  3. Verify your R build tools are working. In R, run:
install.packages("pkgbuild")
pkgbuild::has_build_tools(debug = TRUE)

macOS:

  1. Install R 4.4.3 for your CPU.
  2. Install Xcode Command-Line Tools (CLT):
xcode-select --install

Willing to be a guinea pig? Mac users may want to try installing the toolchain with the package macrtools:

remotes::install_github("rmacoslib/macrtools")
macrtools::macos_rtools_install()

RSG has not tested macrtools (we are a Windows shop); please report back any issues – or if you find success!


Python Environment Manager

Install uv

Install uv, a lightweight Python environment manager, from https://docs.astral.sh/uv/reference/installer/. You can do this via command line:

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Create and Sync Virtual Environment

Then, from the repo root, sync the environment and install PopulationSim:

# from the repo root
uv sync

This creates .venv/ and installs PopulationSim.

Verify Python Executable

Open a bash or PowerShell terminal (use the dropdown arrow beside the terminal name).

Then run:

uv run python --version

You should see something like:

Python 3.12.x

2.3 Configure Environment Variables

Create .Renviron

This installation guide assumes you are using a project-level .Renviron file to set environment variables specific to this project and your machine. This file is git-ignored by default.

What is .Renviron? .Renviron is a simple text file that R reads on startup to set environment variables. It can be placed at the user level (e.g., ~/.Renviron) or project level (in the root of your project). See Managing R for details.

Before proceeding, users should create a .Renviron in the root of this repository like the following, adjusting paths and credentials as needed. We’ll cover each variable below.

Template .Renviron File (Minimal)

PYTHON_VENV_PATH=.venv/Scripts/python.exe

RAW_DATA_PATH=path/to/your/PSRC_2025_client_inputs

# POPS_USER=YourUsername
# POPS_PASSWORD=SecretPassword
# POPS_HOST=db.example.com
# POPS_PORT=xxxx

WEIGHTING_SETTINGS_PATH=psrc_2025.yaml

WEIGHTING_DATA_PATH=test_cache

Add Python Executable Path to .Renviron

To ensure R can find the Python executable managed by uv, set the PYTHON_VENV_PATH variable in your .Renviron file. This path should be relative to the repository root and point to the Python executable inside the .venv directory created by uv sync.

  • On Windows, use:

    PYTHON_VENV_PATH=.venv/Scripts/python.exe
  • On macOS/Linux, use:

    PYTHON_VENV_PATH=.venv/bin/python

Edit your .Renviron file accordingly after running uv sync.

Configure Data Source Access

Next, we add variables to point hts_weighting to the raw data inputs. RSG users will point to the POPS database with POPS_* variables. Clients will point to the path to the raw client inputs, typically shared on Sharepoint.

RAW_DATA_PATH=path/to/your/PSRC_2025_client_inputs
# POPS_USER=YourUsername
# POPS_PASSWORD=SecretPassword
# POPS_HOST=db.example.com
# POPS_PORT=xxxx

Set Weighting Config Path

Next, we specify the path to the settings config yaml file. If absolute path is not used (recommended), it will assume configs directory is in getwd()/configs.

WEIGHTING_SETTINGS_PATH=psrc_2025.yaml

Set Weighting Data Path

Finally, we set the WEIGHTING_DATA_PATH variable to point to where the project directories (input/, working/, output/, report/) will be created on first run. This path can be absolute or relative to the repository root. If not set, a temporary directory will be used (see ?tempdir), which is not persistent.

Note: If you name this anything but “test_cache”, you’ll need to add it to the .gitignore file in the root of this repository. Don’t commit your data!

WEIGHTING_DATA_PATH=test_cache

Optional Developer Settings

These settings are optional, and mainly useful for developers working on the package or running tests. To use, uncomment and set in your .Renviron file.

WEIGHTING_CODE_PATH can be used to specify where the hts_weighting code is stored, if different from the current working directory. This is useful when running devtools::check() to avoid path issues. It should point to the absolute path of the cloned repository.

TESTTHAT_CPUS can be used to specify the number of CPUs to use for parallel testing with testthat. Defaults to 2. TEARDOWN_TEST_DATA can be used to specify which data directories to teardown after tests complete. Options are inputs_dir, outputs_dir, report_dir, and working_dir. Use * to teardown the entire data directory.

# WEIGHTING_CODE_PATH=C:/you/local/path/hts_weighting
# TESTTHAT_CPUS=4
# TEARDOWN_TEST_DATA=inputs_dir, outputs_dir, report_dir, working_dir
# TEARDOWN_TEST_DATA=* # to teardown entire data dir

Install Census API Key

Many steps (ACS/PUMS pulls, geographies) rely on the tidycensus package, which makes calls to the Census’ API service. To use service, you need a Census API key. Follow these steps to install it:

  1. Sign up for a free API key at the Census Bureau’s website.
  2. Once you have your key, add it to your user-level (preferred) or project-level .Renviron file:
CENSUS_API_KEY=your_api_key_here

Restart R afterward.


2.4 Set Up R Package Environment

Initialize renv and Restore Packages

This repository currently uses renv 1.1.5. If needed, first install or update renv:

install.packages("renv")
renv::activate()

Some users may need to run:

renv::load()

We don’t know why this is, but suspect it has something to do with OneDrive mucking up file paths. If you discover the source of this error, please let us know!

Next, restore the R environment from the lockfile (renv.lock):

renv::restore()

On restart you should see something like:

- Project '...' loaded. [renv 1.1.5]

If you add new packages later, remember to run renv::snapshot() to update the lockfile.

Load the hts.weighting Package

Although hts.weighting is an R package, we prefer to load locally from source rather than installing from GitHub. Though it adds an extra step to installation and use, the main advantage is that it permits users to modify functions in tandem with the scripts that use them. We may change this workflow in the future, as the package matures.

Install devtools (once per project):

renv::install("devtools")
renv::snapshot()

Then, try loading the package:

devtools::load_all()

Finally, test that the package loaded correctly and that your environment is set up:

settings <- get_settings()

Optional: If you’d like to avoid calling devtools::load_all() every time you open an R terminal, you can tell R to auto-load the package in interactive sessions via a git-ignored .Rprofile.local file in the root of your project. Add this chunk to .Rprofile.local:

if (interactive() && !identical(Sys.getenv("CI"), "true")) {
  if (!requireNamespace("devtools", quietly = TRUE)) {
    message("devtools not installed; run renv::install('devtools') then renv::snapshot(); restart R.")
  } else {
    try(devtools::load_all("."), silent = TRUE)
  }
}

Restart R, then verify with:

settings <- get_settings()

Sometimes this is persnickety in Positron or VSCode; if you run into issues, just call devtools::load_all() manually.

2.5 Install Quarto (only for RStudio and VSCode users)

Note: Quarto is used to render the book, and some reporting .qmd files in scripts/. It is not needed to run all the weighting scripts.

Positron bundles Quarto in its download; only RStudio/VS Code users will need to install Quarto ≥ 1.7. Download and install Quarto from https://quarto.org/docs/get-started/. After installation, verify that quarto is on your PATH. In a terminal, run:

quarto check

If an old version appears, update your PATH and restart your terminal and R session.

Additionally, you may need to point Quarto to the correct R version.

During quarto check, you should see something like:

 Checking R installation...........OK
      Version: 4.4.3

VSCode users may see the wrong R version here. To fix, add the path to your R installation to your Workspace Settings (Command Palette → “Preferences: Open Workspace Settings (JSON)”):

{
  "terminal.integrated.env.windows": {
    "QUARTO_R": "C:\\Program Files\\R\\R-4.4.3\\bin\\x64" 
  }
}

For details and other potential solutions, see Quarto Environment Variables.

2.6 Optional: IDE Enhancements

YAML Schema Validation

This is optional but recommended - only for Positron/VSCode users.

To get live validation and autocomplete for your configuration .yaml files:

  1. In Positron or VSCode, install the YAML (Red Hat) extension.
  2. Open the Command Palette → “Preferences: Open Workspace Settings (JSON)”.
  3. Add this entry:
"yaml.schemas": {
  "./configs/utils/configs_schema.json": ["configs/*.yaml"]
}

You’ll now see instant linting and tooltips when editing any configs/ .yaml file. This is powered by the JSON schema at configs/utils/configs_schema.json.

⚠️ In development: Settings schema validation is relatively new to the hts.weighting package. Please report any issues you encounter, and use as a general guide rather than a definitive source of truth.

2.7 Troubleshooting

(If you run into new issues, report back here and add to this list!)

  • Windows: compilation errors
    Install Rtools44, then in R:

    install.packages("pkgbuild")
    pkgbuild::has_build_tools(debug = TRUE)
  • Quarto not found Install Quarto and ensure it’s on PATH; quarto check should pass. Restart terminal + R after changing PATH.

  • VS Code can’t find R Set r.rterm.* in Settings JSON (examples above).

2.8 Happy weighting!