3 Configuring a Household Travel Survey (HTS) Weighting Run

Your configuration is the blueprint for the weighting pipeline. In your configuration file (YAML format), you define:

Project & I/O: Where data lives, what runs, and where results go
Reference Data: ACS/PUMS vintages and table mappings
Geography & Zones: PUMAs vs. custom weighting zones, spatial intersection settings
Study Frame: study unit (household or person), days of week
Targets & PopulationSim: control definitions and calibration settings
Round 2/3 Models: day‑pattern and trip‑rate adjustments
Transit and QA/QC: optional controls & diagnostics

This chapter explains how to configure a HTS weighting project using the default schema (configs/examples/config_template.yaml) with specifics from the 2025 PSRC HTS (configs/psrc_2025.yaml).

3.1 Project & Input/Output Setup

This section describes RSG-specific settings for project structure and data input/output (I/O) setup. RSG users will source HTS data from our POPS Postgres database, requiring dbname and schema. Clients running offline with CSVs can omit these keys.

Use hts_table_map to link your raw database/CSV table names to canonical inputs, and weight_output_map to specify output destinations.

The optional run_scripts key lets you define an ordered list of R scripts for the weighting process. With the Quarto manual, clients can control workflow by selecting chapters to run in manual/_quarto.yaml.

Keys at a Glance

Key	Purpose
`project_name`	Descriptive project title
`weight_version_name`	Version identifier for the weighting run
`dbname`, `schema`	Postgres DB pointers
`run_scripts`	Ordered list of R scripts for the weighting process
`hts_table_map`	Canonical → DB table names
`weight_output_map`	Output table destinations
`write_to_db`	Toggle DB writing (off for Clients)

Example (This Project)

#   PROJECT NAME
project_name: "PSRC 2025 Household Travel Survey Weighting"

#   INPUT TABLE MAPPING
hts_table_map:
  household: toc_hh
  person: toc_person
  trip: toc_trip_unlinked
  linked_trip: toc_trip_linked
  tour: toc_tour
  day: toc_day
  value_labels: toc_value_labels
  variable_list: toc_variable_list
  in_study_region: in_study_region
  sample_plan: sample_plan

#   OUTPUT TABLE MAP
weight_output_map:
  household: ex_weights_hh
  person: ex_weights_person
  trip: ex_weights_trip_unlinked
  day: ex_weights_day
  linked_trip: ex_weights_trip_linked
  tour: ex_weights_tour

#   WRITE WEIGHTS TO DATABASE
write_to_db: false # off for clients

3.2 Reference Data (ACS/PUMS & Extras)

This section covers settings for ACS/PUMS vintages, table mappings, and extra PUMS fields to fetch. RSG typically sets acs_year to the year of the most recent 5-year ACS release before or during survey data collection, and pums_year to the the year matching acs_year. This leads to the greatest possible demographic consistency between survey data and control totals.

Users will also need to map ACS table codes to logical names in acs_tables and specify count variables in acs_count_vars. These mappings may change when switching ACS vintages. Check https://censusreporter.org/ for valid table codes when developing new target definitions and methods.

Users can specify additional PUMS fields to fetch with nontarget_vars, which can be useful for diagnostics or stratified reporting. The max_income_bin key sets the top-code for income bins ($200,000 in 2025) while force_balance_hh_weights toggles reconciliation of PUMS household and person totals (recommended).

Keys at a Glance

Key	Purpose
`acs_year`, `acs_dataset`	ACS vintage & survey span
`acs_tables`, `acs_count_vars`	Logical names → ACS codes
`pums_year`, `pums_dataset`	PUMS vintage for targets
`nontarget_vars`	Extra PUMS fields to fetch
`max_income_bin`	Census’ current income top-code
`force_balance_hh_weights`	Reconcile PUMS HH/person totals

Example (This Project)

#   ACS DATA YEAR / DATASET
acs_year: 2023
acs_dataset: acs5

#   ACS TABLE CODES MAPPING
acs_tables:
  - sex_by_age: B01001
  - hhtype: B11002
  - occ_hh: B25002_002
  - hh_income: B19001
  - race: B02001

#   ACS COUNT VARIABLE CODES
acs_count_vars:
  - household: B25003_001
  - person: B11002_001

#   PUMS DATA YEAR / DATASET
pums_year: 2023
pums_dataset: acs1

#   ADDITIONAL PUMS VARIABLES (none specified)
nontarget_vars: []

#   MAXIMUM INCOME BIN (top-code)
max_income_bin: 200000

#   FORCE BALANCED HOUSEHOLD WEIGHTS
force_balance_hh_weights: true

3.3 Geography & Zones

This section covers geographic settings for weighting zones.

The zone_type key selects between using Census PUMAs or custom client_zones for weighting. Use zone_groups to define named aggregations of zones for reporting (e.g., “Seattle” vs “Rest of King County”).

The remaining keys control how survey geographies interact with weighting zones. The xwalk_sliver_threshold key sets a minimum area threshold to drop tiny overlaps when intersecting survey geographies with weighting zones. The puma_buffer key adds a buffer (in meters) when assigning households to PUMAs, which can help with edge cases near PUMA boundaries. The largest_bg_to_client_zone key forces block groups to assign to the client zone with the largest area overlap, useful when block groups nest cleanly within client zones (e.g, client zones are counties or other aggregations of block groups). Finally, the use_reported_home key toggles whether to use the respondent-reported home location for segment assignment rather than the home location where they received their mailed invitation (usually true).

Why this matters

Geography determines where you enforce representativeness. PUMAs are robust and simple; client zones align to reporting geographies.

Keys at a Glance

Key	Purpose
`zone_type`	`pumas` or `client_zones`
`zone_groups`	Named aggregations for weighting/reporting
`xwalk_sliver_threshold`	Drop tiny overlaps in crosswalk
`puma_buffer`	Buffer for PUMA selection (meters)
`largest_bg_to_client_zone`	Force single‑zone BG assignment
`use_reported_home`	Segment assignment rule

Example (This Project)

#   WEIGHTING GEOGRAPHIC ZONE TYPE
zone_type: pumas  # options: pumas | client_zones

#   GROUPS OF ZONES FOR AGGREGATION
zone_groups:
  - "King County - Seattle": ["5323313", "5323318", "5323314", "5323315", "5323317", "5323312", "5323316"]
  - "King County - Other": ["5323310", "5323309", "5323302", "5323303", "5323306", "5323307", "5323301", "5323304", "5323305", "5323311"]
  - "Kitsap County - Expanded": ["5323502", "5323501", "5323308", "5325308"]
  - "Pierce County": ["5325303", "5325302", "5325307", "5325304", "5325305", "5325306", "5325301"]
  - "Snohomish County": ["5326103", "5326102", "5326101", "5326104", "5326105", "5326106"]

#   CROSSWALK GEOMETRY SLIVER THRESHOLD
xwalk_sliver_threshold: 0.05

#   PUMA BUFFER DISTANCE (METERS)
puma_buffer: 100

#   LARGEST BLOCK GROUP -> CLIENT ZONE (only if BGs nest cleanly)
largest_bg_to_client_zone: false

#   USE REPORTED HOME LOCATION FOR SEGMENT ASSIGNMENT
use_reported_home: true

3.4 Reproducibility & Study Frame

This section covers settings for reproducibility and study frame definition. The rng_seed key sets the random number generator seed for reproducibility across runs. The study_unit key defines whether the weighting is at the household or person level, which affects target definitions and weight application. The weight_dow_groups key allows users to define custom day-of-week groupings for weighting (e.g., average weekday vs weekend).

Many of RSG’s recent HTSs do not survey persons unrelated to the primary respondent (i.e., roomates). This has necessitated adaptations in the weighting code, because the target (Census) data defines households as all those living together – inclusive of nonrelatives. The unrelated_adjustment key specifies how to handle unrelated persons in household weights, with options none (only use if the HTS did sample nonrelatives) person (default), day (experimental) and restructure (danger zone - kept for backwards compatibility and experimentation).

Finally, the impute_unrelated_income key toggles whether to impute income for unrelated persons, which can be important for income-based targets. This should be true only if the HTS questionnaire specifically excluded nonrelatives from the income question.

Keys at a Glance

Key	Purpose
`rng_seed`	Reproducibility
`study_unit`	`household` or `person`
`weight_dow_groups`	Day groupings (Mon=1 … Sun=7)
`unrelated_adjustment`	How to treat unrelated persons
`impute_unrelated_income`	Income imputation toggle

Example (This Project)

#   RANDOM NUMBER SEED
rng_seed: 4119

#   STUDY UNIT FOR WEIGHTING
study_unit: household

#   DAY-OF-WEEK WEIGHT GROUPS (Mon=1 ... Sun=7)
weight_dow_groups:
  - avg_weekday: [1, 2, 3, 4]

#   UNRELATED HOUSEHOLDER ADJUSTMENT METHOD
#   options: none | person | day | restructure
unrelated_adjustment: "none"

#   UNRELATED HOUSEHOLDER INCOME IMPUTATION
impute_unrelated_income: false

3.5 Targets & PopulationSim (Round 1)

This section covers target definitions and PopulationSim settings for Round 1 demographic calibration. Calibration aligns survey weights to independent control totals across geographies and demographics. The targets are specified as a list of named entries under the targets key, each defining the method to prepare the target (see: R/prep_targets_methods.R), survey and PUMS input variables, labels, levels, and geography.

Example (This Project - Sample of Four Targets)

targets:
  - h_size:
      - method: "prep_target_h_size"
      - survey_input: num_people
      - pums_input: NP
      - label_var: "Household size"
      - levels: [1, 2, 3]
      - label_levels: ["1", "2", "3+"]
      - geography: zone_group
  - h_income:
      - method: "prep_target_income"
      - survey_input: income_imputed_value
      - pums_input: HINCP
      - label_var: "Household income"
      - levels: [24999, 49999, 74999, 99999, 199999]
      - label_levels: ["$0-$24,999", "$25,000-$49,999", "$50,000-$74,999", "$75,000-$99,999", "$100,000-$199,999", "$200,000+"]
      - geography: zone_group
  - p_commutemode:
      - method: "prep_target_commutemode"
      - survey_input: [work_mode, job_type, work_from_home]
      - pums_input: JWTRNS
      - label_var: "Commute mode"
      - levels: [home, transit, walk, bike, other, none]
      - label_levels: ["Work from home", "Transit", "Walk", "Bike", "Other (includes auto)", "None"]
      - geography: zone_group

(Only four shown for brevity.)

Target Overrides

Many projects require customizations to targets beyond the basic definitions above. The target_updates key allows users to override or extend target definitions without modifying the core targets list. This is useful when some, but not all, weighting zone groups have low sample size in either survey or PUMS data.

Example (This Project)

target_updates:
  - p_commutemode:
      - levels:
          - bike_transit_walk: [bike, transit, walk]
      - label_levels:
          - bike_transit_walk: "Bike/transit/walk"
      - groups: ["Kitsap County - Expanded", "Snohomish County", "Pierce County", "King County - Other"]

PopulationSim Settings

Here, we define key PopulationSim settings for Round 1 calibration. The popsim_search_max_exp and popsim_search_bounds keys control the search for optimal expansion factors, while popsim_initial_label sets the run label for Round 1. The popsim_setting_updates block allows users to set the final PopulationSim parameters, such as minimum/maximum expansion factors and absolute upper bounds.

Example (This Project)

#   POPULATIONSIM SEARCH SETTINGS
popsim_search_max_exp: [4, 5, 6, 7, 8]
popsim_search_bounds: [4000, 6000, 8000, 10000, 20000]

#   POPULATIONSIM RUN LABELS
popsim_initial_label: run1_initial # recent changes to code might not allow changing this
popsim_daypat_label: run2_daypat   # recent changes to code might not allow changing this

#   POPULATIONSIM SETTINGS OVERRIDES
popsim_setting_updates:
  min_expansion_factor: 0.143
  max_expansion_factor: 7
  absolute_upper_bound: 20000

Importance Weights

Area of active investigation

PopulationSim’s importance weighting is an area of active research. RSG recommends using default importance weights as a starting point, and adjust knowing that this is an evolving area.

The popsim_importance block defines relative importance weights for each population/person/household control used by PopulationSim. A higher number means that aligning the synthetic population to that control is more important, potentially at the expense of matching other controls with lower weights.

These weights are relative: only the ratios between them matter in deciding trade-offs.
Use higher values for controls you care most about; lower values for those you are willing to accept less‐perfect match.
It is common practice to space weights in orders of magnitude (e.g., 100, 200, 500, 1000) to separate tiers of priority.
After running the model, consult the output validation (mismatches, residuals) and adjust weights as needed: if a key control is under-fitted, increase its weight; if a less‐critical control is dominating the solution, reduce its weight.
The weights are applied at each run label (e.g., run1_initial, run2_daypat), so you can tailor priority for each round of the model.

See: What are importance weights? at the PopulationSim documentation for more details.

Example (This Project)

#   POPULATIONSIM IMPORTANCE WEIGHTS
popsim_importance:
  run1_initial:
    p_total: 100
    h_total: 1000
    p_commutemode_walk: 200
  run2_daypat:
    p_total: 100
    h_total: 1000
    p_made_none: 500
    p_made_mandatory: 200
    p_made_nonmand: 200
    p_made_na: 200
    p_commutemode_walk: 200

3.6 Round 2: Day‑Pattern Adjustment

This section covers settings for Round 2 day‑pattern adjustment. Day‑pattern adjustment corrects for platform and reporting biases by modeling travel incidence on a daily level. The daypat_weighting key toggles whether to run day‑pattern weighting. The daypat_adjustments_table key specifies which table to use for day‑pattern modeling (trip or linked_trip).

Finally, the daypat_formula_vars key lists the predictor variables to include in the day‑pattern model.

In Development

The day-pattern model is in active development. RSG recommends starting with this formula and adjusting inline in the code if desired. Please reconcile your changes and bring updates back to the main branch of hts_weighting in pull requests as appropriate.

Example (This Project)

daypat_weighting: true
daypat_adjustments_table: linked_trip
daypat_formula_vars:
  - "zero_vehicle"
  - "as.character(income_imputed_label)"
  - "age_under_35"
  - "age_over_65"
  - "is_employed"
  - "is_student"
  - "diary_online"
  - "diary_call"
  - "age_under_35 * diary_online"
  - "age_over_65 * diary_call"

3.7 Round 3: Trip‑Rate Adjustment

This section covers settings for Round 3 trip‑rate adjustment. Trip‑rate adjustment corrects for residual biases in trip rates by trip type after day‑pattern adjustment. The trip_rate_factor_rescaling key toggles whether to rescale trip‑rate factors to have a mean of 1, which helps preserve overall trip totals. The trip_rate_factor_cap key sets a maximum cap on trip‑rate factors to prevent extreme adjustments (default: 2; i.e., no more than doubling Round 2 trip weights).

The trip_rate_model_lhs key defines the left-hand side (LHS) variables for the trip‑rate models, specifying which trip types to model and their corresponding count variables. The trip_rate_model_vars key lists the predictor variables to include in each trip‑type model.

Finally, the weight_synthetic_trips key toggles whether to apply trip‑rate weighting to synthetic trips. Many users choose to leave this off, especially those who primarily rely on linked trips for analysis.

Example (This Project)

trip_rate_factor_rescaling: true
trip_rate_factor_cap: 2

trip_rate_model_lhs:
  - work: [num_work]
  - school: [num_school]
  - other: [num_other]
  - loop: [num_loop] # Placeholder, is not adjusted

trip_rate_model_vars:
  - num_work: 
    - "diary_binary"
    - "hh_size"
    - "income_cat"
    - "n_kids"
    - "employment_status"
    - "wfh"
    - "age_cat_adults"
    - "is_student"
    - "work_loc_varies"
    - "education_cat"
  - num_school:
    - "diary_binary"
    - "hh_size"
    - "income_cat"
    - "school_cat"
    - "p_employment"
    - "p_wfh"
    - "p_gender"
  - num_other:
    - "diary_binary"
    - "hh_size"
    - "income_cat"
    - "n_kids"
    - "employment_status"
    - "wfh"
    - "age_cat_adults"
    - "is_student"
    - "education_cat"
    
#   TOGGLE SYNTHETIC TRIP WEIGHTING
weight_synthetic_trips: false

3.8 Transit Controls (Optional)

This section covers settings for optional transit boarding controls. The transit_target_type key specifies whether to use linked_trips or boardings as the basis for transit targets. The transit_weekday_boardings key sets the total number of weekday boardings to target, while the transit_weekend_factor key defines the ratio of weekend to weekday boardings. The transit_boardings_per_trip key sets the average number of boardings per transit trip, which helps convert between trips and boardings. Finally, the transit_modes key lists the modes to include in transit targets (e.g., rail, public bus, other bus).

In Development

Transit controls are in active development. RSG recommends consulting with RSG staff before implementing transit targets, as this approach will likely evolve further.

Example (MassDOT 2024)

transit_target_type: "linked_trips"  # Options: linked_trips, boardings
transit_weekday_boardings: 887540
transit_weekend_factor: 0.52 
transit_boardings_per_trip: 1.17  # Average boardings per trip
transit_modes: ['transit'] # ["rail", "public bus", "other bus"]