Appendix B — Configuration Schema Documentation

C Configuration Reference

C.1 Project & I/O

dbname | Database Name

Optional Database name for input/output tables. Used in DB connection scripts.

Called Directly by Functions:

  • write_to_db

Example:

dbname:
- psrc
- hts_weighting_testing

schema | Database Schema

Optional Database schema for input/output tables. Used in DB connection scripts.

Called Directly by Functions:

  • write_to_db

Example:

schema:
- hts_2025_y5
- nyc_cms_hts_2024

run_scripts | Scripts to Run

Optional (RSG specific): List of scripts to run in the pipeline. Each item must be the name of a script in the scripts/ directory. Default:

run_scripts:
- 001_input_checker.R
- 005_create_crosswalk.R
- 020_control_data_cleaning.R
- 022_control_data_tabulation.R
- 023_control_data_sum_by_zones.R
- 030_survey_data_cleaning.R
- 031_survey_data_imputation.R
- 032_survey_data_tabulation.R
- 040_initial_expansion.R
- 050_initial_weighting.R
- 060_daypat_adjustments.R
- 070_daypat_weighting.R
- 080_person_day_trip_weights.R
- 085_trip_weight_adjustment.R
- 090_write_to_db.R
- 100_weight_checks.Rmd
- 105_weighting_memo.Rmd

hts_table_map | Input Table Mapping

Required Mapping of canonical table names to database tables. Used in all ETL scripts.

Called Directly by Functions:

  • get_db_table_name

Called Indirectly by Functions:

  • fetch_hts_table
  • fix_value_labels_on_load
  • get_income_broad
  • get_max_income_bin
  • get_max_survey_income_bin
  • get_user_specs
  • impute_income_pnta
  • prep_hhs_for_income_imputation
  • prep_initial_expansion_data
  • prepare_zone_groups
  • test_results
  • update_income_broad_labels

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

Default:

hts_table_map:
  household: toc_hh
  person: toc_person
  day: toc_day
  trip: toc_trip
  linked_trip: toc_linked_trip
  tour: toc_tour

weight_output_map | Output Table Map

Required Mapping of canonical output names to output tables. Used in output/DB writing scripts.

Called Directly by Functions:

  • test_results

Used in Scripts:

  • 10_round-3-weighting.qmd

Default:

weight_output_map:
  household: ex_weights_hh
  person: ex_weights_person
  day: ex_weights_day
  trip: ex_weights_trip
  linked_trip: ex_weights_linked_trip
  tour: ex_weights_tour

write_to_db | Write Weights to Database

Optional If TRUE, writes weights to the database. Used in output scripts.

Called Directly by Functions:

  • check_write_to_db

Called Indirectly by Functions:

  • get_settings
  • get_test_settings

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

Default:

write_to_db: no

raw_data_root | raw_data_root

Optional Location of RSG-provided raw data; typically specified in the .Renviron file as RAW_DATA_PATH. Used as the base path for reading raw survey and HTS data files.

Called Directly by Functions:

  • fetch_hts_table
  • fetch_study_region
  • get_settings

Called Indirectly by Functions:

  • add_geometry_to_table
  • adjust_ref_counts_dataset
  • calc_sample_plan_counts
  • cluster_pumas
  • create_ie_adjustment_data
  • fetch_acs
  • fetch_pums
  • fix_value_labels_on_load
  • get_acs_bg_counts
  • get_acs_bg_counts_base
  • get_acs_ethnicity
  • get_acs_race
  • get_bg_geom
  • get_county_fips
  • get_income_broad
  • get_max_income_bin
  • get_max_survey_income_bin
  • get_puma_geom
  • get_puma_ids
  • get_pumas
  • get_state_fips
  • get_test_settings
  • get_tracts_puma_xwalk
  • get_user_specs
  • impute_ethnicity
  • impute_income_pnta
  • impute_race
  • load_sf_obj
  • prep_hhs_for_income_imputation
  • prep_initial_expansion_data
  • prep_zones_sf
  • prepare_acs_income
  • prepare_zone_groups
  • sampled_latlon_to_bg
  • test_results
  • update_income_broad_labels
  • zone_group_plots

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

C.2 Reference Data (ACS/PUMS & extras)

acs_year | ACS Data Year

Required ACS year for income imputation and block group matching. Used in crosswalk and cleaning scripts.

Called Directly by Functions:

  • create_ie_adjustment_data
  • fetch_acs
  • get_acs_ethnicity
  • get_acs_race
  • get_bg_geom
  • get_county_fips
  • get_max_acs_income_bin
  • get_state_fips
  • impute_ethnicity
  • impute_gender
  • impute_race
  • prepare_acs_income

Called Indirectly by Functions:

  • add_geometry_to_table
  • adjust_ref_counts_dataset
  • calc_sample_plan_counts
  • cluster_pumas
  • fetch_hts_table
  • fetch_pums
  • fix_value_labels_on_load
  • get_acs_bg_counts
  • get_acs_bg_counts_base
  • get_income_broad
  • get_max_income_bin
  • get_max_survey_income_bin
  • get_puma_geom
  • get_puma_ids
  • get_pumas
  • get_tracts_puma_xwalk
  • get_user_specs
  • impute_income_pnta
  • load_sf_obj
  • prep_hhs_for_income_imputation
  • prep_initial_expansion_data
  • prep_zones_sf
  • prepare_zone_groups
  • sampled_latlon_to_bg
  • test_results
  • update_income_broad_labels

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

Default:

acs_year: 2023

acs_dataset | ACS Dataset Type

Required ACS survey time span, matching the ‘survey’ argument in tidycensus::get_acs().

Called Directly by Functions:

  • adjust_ref_counts_dataset
  • calc_sample_plan_counts
  • create_ie_adjustment_data
  • fetch_acs
  • get_acs_bg_counts_base
  • get_max_acs_income_bin

Called Indirectly by Functions:

  • fetch_hts_table
  • fix_value_labels_on_load
  • get_acs_bg_counts
  • get_acs_ethnicity
  • get_acs_race
  • get_income_broad
  • get_max_income_bin
  • get_max_survey_income_bin
  • get_user_specs
  • impute_ethnicity
  • impute_income_pnta
  • impute_race
  • prep_hhs_for_income_imputation
  • prep_initial_expansion_data
  • prepare_acs_income
  • prepare_zone_groups
  • test_results
  • update_income_broad_labels

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

Allowed Values: ‘acs1’, ‘acs5’ Default:

acs_dataset: acs5

acs_tables | ACS Table Codes Mapping

Required List of ACS table codes by logical name. Each entry is a mapping of one logical name to one table code.

Called Directly by Functions:

  • get_acs_race
  • get_max_acs_income_bin
  • prepare_acs_income

Called Indirectly by Functions:

  • fetch_hts_table
  • fix_value_labels_on_load
  • get_income_broad
  • get_max_income_bin
  • get_max_survey_income_bin
  • get_user_specs
  • impute_income_pnta
  • impute_race
  • prep_hhs_for_income_imputation
  • prep_initial_expansion_data
  • prepare_zone_groups
  • test_results
  • update_income_broad_labels

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

Default:

acs_tables:
- sex_by_age: B01001
- hhtype: B11002
- tenure_hh: B25003_001
- occ_hh: B25002_002
- hh_income: B19001
- race: B02001

acs_count_vars | ACS Count Variable Codes

Required Mappings of ACS count variable names to codes. Each entry is a single-key object.

Called Directly by Functions:

  • adjust_ref_counts_dataset
  • get_acs_bg_counts_base

Called Indirectly by Functions:

  • calc_sample_plan_counts
  • get_acs_bg_counts

Used in Scripts:

  • 05_setup-data-geographies.qmd

Default:

acs_count_vars:
- household: B25003_001
- person: B11002_001

state_fips | State FIPS Code

Optional Federal Information Processing Standard (FIPS) code for the state in the study region. Used to select and filter ACS and PUMS data for geographic subsetting, clustering, and reporting. This is typically blank and discovered automatically from existing data in fetch_acs.R and fetch_pums.R, but can be set here to override that behavior.

Called Directly by Functions:

  • cluster_pumas
  • fetch_acs
  • fetch_pums

Called Indirectly by Functions:

  • calc_sample_plan_counts
  • create_ie_adjustment_data
  • get_acs_bg_counts
  • get_acs_bg_counts_base
  • get_acs_ethnicity
  • get_acs_race
  • impute_ethnicity
  • impute_income_pnta
  • impute_race
  • prepare_acs_income
  • prepare_zone_groups

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd

county_fips | County FIPS Code

Optional Federal Information Processing Standard (FIPS) code(s) for counties in the study region. Used to select and filter ACS and PUMS data for geographic subsetting. This is typically blank and discovered from existing data in fetch_acs.R and fetch_pums.R, but can be set here to override that behavior.

Called Directly by Functions:

  • fetch_acs

Called Indirectly by Functions:

  • calc_sample_plan_counts
  • get_acs_bg_counts
  • get_acs_bg_counts_base
  • get_acs_ethnicity
  • get_acs_race
  • impute_ethnicity
  • impute_income_pnta
  • impute_race
  • prepare_acs_income

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 07_survey-data-preparation.qmd

pums_year | PUMS Data Year

Required PUMS year for target data. Used in crosswalk and cleaning scripts.

Called Directly by Functions:

  • adjust_ref_counts_dataset
  • calc_sample_plan_counts
  • cluster_pumas
  • create_ie_adjustment_data
  • fetch_pums
  • get_acs_bg_counts_base
  • get_puma_geom
  • get_puma_ids
  • get_pumas
  • get_tracts_puma_xwalk
  • read_pums_codebook
  • sampled_latlon_to_bg

Called Indirectly by Functions:

  • add_geometry_to_table
  • append_var_lab
  • get_acs_bg_counts
  • get_settings
  • get_test_settings
  • impute_income_nonrelatives
  • load_sf_obj
  • prep_zones_sf
  • prepare_income_fit_dt
  • prepare_zone_groups
  • update_settings_pums_vars

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

Default:

pums_year: 2023

pums_dataset | PUMS Dataset Type

Required PUMS dataset type. Used in crosswalk and cleaning scripts.

Called Directly by Functions:

  • adjust_ref_counts_dataset
  • fetch_pums
  • read_pums_codebook

Called Indirectly by Functions:

  • append_var_lab
  • calc_sample_plan_counts
  • create_ie_adjustment_data
  • get_settings
  • get_test_settings
  • impute_income_nonrelatives
  • prepare_income_fit_dt
  • update_settings_pums_vars

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

Allowed Values: ‘acs1’, ‘acs5’ Default:

pums_dataset: acs1

nontarget_vars | Additional PUMS Variables

Required Additional PUMS columns to fetch. Used in fetch_pums scripts.

Called Directly by Functions:

  • update_settings_pums_vars

Called Indirectly by Functions:

  • get_settings
  • get_test_settings

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

Example:

nontarget_vars:
- - PINCP
  - RELSHIPP

max_income_bin | Maximum Income Bin

Required Maximum income bin for top-coding. Used in income alignment scripts.

Called Directly by Functions:

  • get_max_income_bin

Called Indirectly by Functions:

  • fetch_hts_table
  • fix_value_labels_on_load
  • get_income_broad
  • get_max_survey_income_bin
  • get_user_specs
  • impute_income_pnta
  • prep_hhs_for_income_imputation
  • prep_initial_expansion_data
  • prepare_zone_groups
  • test_results
  • update_income_broad_labels

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

Default:

max_income_bin: 200000

force_balance_hh_weights | Force Balanced Household Weights

Required If TRUE, PUMS household weights are recalculated from person weights. Corrects for discrepancy between sum(PWGPT) and sum(WGTP) in PUMS data.

Called Directly by Functions:

  • adjust_target_to_study_zones
  • summarize_pums

Called Indirectly by Functions:

  • calc_target_ci

Used in Scripts:

  • 06_target-data-preparation.qmd
  • 08_round-1-weighting.qmd

Default:

force_balance_hh_weights: yes

age_employable | Employable Age Threshold

Optional Minimum age considered employable for income imputation and workforce analysis. Used in imputation and preparation scripts to filter or categorize persons.

Called Directly by Functions:

  • impute_income_nonrelatives
  • prepare_income_fit_dt
  • prepare_persons_dt

Used in Scripts:

  • 07_survey-data-preparation.qmd

Default:

age_employable: 16

C.3 Geography & Zones

zone_type | Weighting Geographic Zone Type

Required Type of geographic zones used in the analysis. Used in crosswalk and tabulation scripts.

Called Directly by Functions:

  • add_geometry_to_table
  • adjust_target_to_study_zones
  • get_user_specs
  • group_from_defined_list

Called Indirectly by Functions:

  • prepare_zone_groups

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd

Allowed Values: ‘client_zones’, ‘pumas’ Default:

zone_type: pumas

zone_groups | Groups of zones for aggregation

Required Groups of zones for aggregation. Used in crosswalk and tabulation scripts.

Called Directly by Functions:

  • get_user_specs

Called Indirectly by Functions:

  • prepare_zone_groups

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd

Example:

zone_groups:
- North:
  - '5321101'
- South:
  - '5321104'
  - '5321102'
  - '5321103'

xwalk_sliver_threshold | Crosswalk Geometry Sliver Threshold

Required Threshold for geometry slivers in crosswalk creation. Used in crosswalk scripts.

Used in Scripts:

  • 05_setup-data-geographies.qmd

Default:

xwalk_sliver_threshold: 0.05

puma_buffer | PUMA Buffer Distance (meters)

Required Buffer distance (meters) for PUMA region selection. Used in crosswalk scripts.

Called Directly by Functions:

  • get_puma_geom
  • get_puma_ids

Called Indirectly by Functions:

  • add_geometry_to_table
  • cluster_pumas
  • create_ie_adjustment_data
  • fetch_pums
  • get_tracts_puma_xwalk
  • load_sf_obj
  • prep_zones_sf
  • prepare_zone_groups

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd

Default:

puma_buffer: 100

puma_clustering | PUMA Clustering Method

Optional Specifies the method used to cluster PUMAs into zone groups for weighting and reporting. Typical values include ‘kmeans’, ‘spectral’, and ‘entropy’. The clustering method controls how PUMAs are grouped for assignment to client zones and determines the approach for spatial aggregation in the weighting process. Used in entropy_zone_groups, spectral_zone_groups, and related clustering functions.

Called Directly by Functions:

  • entropy_zone_groups
  • get_user_specs
  • spectral_zone_groups

Called Indirectly by Functions:

  • cluster_pumas
  • prepare_zone_groups

Used in Scripts:

  • 06_target-data-preparation.qmd
  • 08_round-1-weighting.qmd

Allowed Values: ‘kmeans’, ‘spectral’, ‘entropy’ Default:

puma_clustering: kmeans

geographies | Geography Definitions

Optional Custom definitions for geographic groupings or crosswalks, such as aggregations of zones, PUMAs, or client areas. Used to configure zone-level weighting and reporting. If set here, will override the definitions in inst/populationsim/configs/settings.yaml.

Called Directly by Functions:

  • popsim_settings_updates

Used in Scripts:

  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd

Default:

geographies:
- region
- zone_group

largest_bg_to_client_zone | Largest Block Group to Client Zone Mapping

Optional If TRUE, assigns each Census block group to the client zone with which it has the largest geographic overlap (rather than splitting proportions). Used in crosswalk generation to simplify mapping and ensure each block group is assigned to a single client zone. Typically used in 005_create_crosswalk.R.

Used in Scripts:

  • 05_setup-data-geographies.qmd

Default:

largest_bg_to_client_zone: no

use_reported_home | Use Reported Home Location for Segment Assignment

Required If TRUE, uses reported home location for segment assignment. Used in weighting scripts.

Used in Scripts:

  • 07_survey-data-preparation.qmd

Default:

use_reported_home: yes

C.4 Reproducibility & Study Frame

rng_seed | Random Number Seed

Required Random number seed for reproducibility. Used in all scripts.

Called Directly by Functions:

  • adjust_unrelated_pums
  • cluster_pumas
  • get_user_specs
  • impute_ethnicity
  • impute_gender
  • impute_race

Called Indirectly by Functions:

  • prepare_zone_groups

Used in Scripts:

  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd

Default:

rng_seed: 4119

study_unit | Study Unit for Weighting

Required Unit of analysis for weighting. Used in all weighting scripts.

Called Directly by Functions:

  • adjust_pums_to_reference
  • adjust_ref_counts_dataset
  • adjust_reference_to_target
  • adjust_target_to_study_zones
  • calc_initial_weights
  • calc_person_weights
  • calc_sample_plan_counts
  • check_diff
  • check_initial_weights
  • create_ie_adjustment_data
  • get_acs_bg_counts_base
  • get_settings
  • prep_initial_expansion_data
  • summarize_pums

Called Indirectly by Functions:

  • calc_target_ci
  • get_acs_bg_counts
  • get_test_settings

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

Allowed Values: ‘household’, ‘person’ Default:

study_unit: household

weight_dow_groups | Day-of-Week Weight Groups

Required Defines how days of the week are grouped for survey weighting and reporting. Each object maps a group label to an array of day indices (Monday=1, …, Sunday=7). Used to aggregate and apply weights to specified day groups.

Called Directly by Functions:

  • adjust_reference_to_target
  • calc_day_weights
  • calc_target_ci
  • check_initial_weights
  • create_importance_list
  • get_day_groups
  • label_targets
  • plot_weight_fit
  • popsim_make_control_config
  • prepare_targets

Called Indirectly by Functions:

  • calc_complete_hhdays
  • calc_weight_fit
  • impute_income_nonrelatives
  • prep_transit_target
  • prepare_impute_targets
  • prepare_income_fit_dt
  • prepare_persons_dt

Used in Scripts:

  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

Default:

weight_dow_groups:
  avg_weekday:
  - 2
  - 3
  - 4

unrelated_adjustment | Unrelated Householder Adjustment Method

Required Method to handle unrelated householders (roommates, nonrelatives) in survey and PUMS data. Unrelated householders may not be included in trip diaries, and may require special treatment for weights and household structure.

Options: - none: No adjustment; unrelated householders are treated as-is in survey and PUMS data. - person: Unrelated persons are assigned zero person weights and weights for remaining persons are adjusted upwards to preserve household totals. Household weights remain unchanged. Used for person-level weighting. - day: Like person, but only applies adjustment to person-day and trip-level weights; person weights themselves are not modified. Preserves core demographic distributions while adjusting trip-level results. - restructure: Most rigorous. Unrelated persons are separated into their own single-person households in PUMS and survey data. Original household size, vehicle counts, and income are reduced accordingly. Used when household structure must match survey approach exactly (e.g., trip diary only includes related persons).

Called Directly by Functions:

  • adjust_unrelated_pums
  • adjust_unrelated_survey
  • calc_day_weights
  • calc_initial_weights
  • calc_person_weights
  • get_settings

Called Indirectly by Functions:

  • get_test_settings

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

Allowed Values: ‘none’, ‘person’, ‘day’, ‘restructure’ Default:

unrelated_adjustment: person

impute_unrelated_income | Unrelated Householder Income Imputation

Required If TRUE, imputes income for unrelated persons. Used in weighting scripts.

Called Directly by Functions:

  • impute_income_nonrelatives

Used in Scripts:

  • 07_survey-data-preparation.qmd

Default:

impute_unrelated_income: yes

nps_segments | Non-Probability Sample Segments

Optional Specifies the list of non-probability sample (NPS) segments included in the survey. These are supplemental or convenience sample groups that do not follow the main ABS (address-based sample) design. When defined, these segments are matched by label to the sample_segment variable in the survey data, and are assigned an ‘NPS’ invitation type in downstream cleaning scripts. All households in these segments can be lumped together as a single supplemental group for weighting and reporting, or further blended with ABS samples using the nps_blending_factor. If omitted or empty, all invitations default to ‘ABS’. Note: If NPS segments are used, additional blending logic may be required in weighting and reporting scripts.

Used in Scripts:

  • 07_survey-data-preparation.qmd

Example:

nps_segments:
- Supplemental
- University
- Outreach

nps_blending_factor | Non-Probability Sample Blending Factor

Optional Sets the relative weight or adjustment factor for households in non-probability sample (NPS) segments when blending with the main ABS (address-based sample) in survey weighting and reporting.

If specified, this factor determines how NPS households are upweighted or downweighted relative to ABS households, ensuring that the combined sample reflects the intended proportions. If not defined, the blending factor is automatically calculated based on the observed share of NPS households in the sample.

Use this field to override the default blending and explicitly control the contribution of NPS segments in expansion and weighting. Typical values are numeric (e.g., 0.5 to give NPS segments half the weight of ABS segments, or 1.0 for equal weighting).

Note: When NPS segments are present, careful adjustment is required to avoid biasing regional estimates. See weighting scripts for details on how this blending is applied.

Called Directly by Functions:

  • calc_alpha

Called Indirectly by Functions:

  • calc_initial_weights
  • prep_initial_expansion_data

Used in Scripts:

  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd

C.5 Targets & PopulationSim

input_table_list | Input Table List

Optional List of input tables to load for PopulationSim and weighting steps. Each item should specify the canonical table name, source filename, and optionally the index column. If set here, will override the default list in inst/populationsim/configs/settings.yaml.

Called Directly by Functions:

  • popsim_settings_updates

Used in Scripts:

  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd

Default:

input_table_list:
- tablename: households
  filename: seed_households.csv
  index_col: hh_id
- tablename: persons
  filename: seed_households.csv
- tablename: geo_cross_walk
  filename: geo_cross_walk.csv
- tablename: zone_group_control_data
  filename: control_totals_zone_group.csv
- tablename: region_control_data
  filename: control_totals_region.csv

targets | Target Variables and Definitions

Required Each array item is a singleton object mapping the target name (e.g., p_gender, h_size) to an array of singleton key/value objects. This matches the YAML form:

  • h_size: - method: “prep_target_h_size” - survey_input: num_people - pums_input: NP - label_var: “Household size” - levels: [1,2,3,4,5] - label_levels: [“1”,“2”,“3”,“4”,“5+”] - geography: zone_group

Called Directly by Functions:

  • calc_survey_ci
  • calc_target_ci
  • check_group_sums
  • create_target_update_table
  • entropy_zone_groups
  • get_acs_race
  • get_target_map
  • impute_ethnicity
  • impute_race
  • label_targets
  • plot_weight_fit
  • popsim_make_control_config
  • popsim_settings_updates
  • prep_control_tables
  • prep_target_adults
  • prep_target_age
  • prep_target_commutemode
  • prep_target_cross
  • prep_target_edulevel
  • prep_target_employment
  • prep_target_ethnicity
  • prep_target_gender
  • prep_target_h_size
  • prep_target_income
  • prep_target_kids
  • prep_target_race
  • prep_target_univstudent
  • prep_target_vehicles
  • prep_target_workers
  • prep_transit_target
  • prepare_targets
  • update_settings_pums_vars

Called Indirectly by Functions:

  • cluster_pumas
  • get_settings
  • get_test_settings
  • impute_income_nonrelatives
  • popsim_make_input_data
  • prepare_impute_targets
  • prepare_income_fit_dt
  • prepare_persons_dt
  • prepare_zone_groups
  • summarize_pums
  • summarize_survey
  • update_targets

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

Default:

targets:
- h_size:
  - description: Household size (persons).
  - method: prep_target_h_size
  - survey_input: num_people
  - pums_input: NP
  - label_var: Household size
  - levels:
    - 1
    - 2
    - 3
    - 4
    - 5
  - label_levels:
    - '1'
    - '2'
    - '3'
    - '4'
    - 5+
  - geography: zone_group
- h_income:
  - description: Total household income (bins are upper bounds).
  - method: prep_target_income
  - survey_input: income_imputed_value
  - pums_input: HINCP
  - label_var: Household income
  - levels:
    - 24999
    - 49999
    - 74999
    - 99999
    - 199999
  - label_levels:
    - $0-$24,999
    - $25,000-$49,999
    - $50,000-$74,999
    - $75,000-$99,999
    - $100,000-$199,999
    - $200,000+
  - geography: zone_group
- h_workers:
  - description: Number of employed adults in household.
  - method: prep_target_workers
  - survey_input:
    - age
    - employment
  - pums_input: ESR
  - label_var: Number of workers
  - levels:
    - 0
    - 1
    - 2
  - label_levels:
    - '0'
    - '1'
    - 2+
  - geography: zone_group
- h_vehicles:
  - description: Vehicle sufficiency (none/insuff/suff).
  - method: prep_target_vehicles
  - survey_input:
    - can_drive
    - num_vehicles
  - pums_input:
    - VEH
    - AGEP
  - label_var: Vehicle sufficiency
  - levels:
    - none
    - insuff
    - suff
  - label_levels:
    - None
    - Insufficient
    - Sufficient
  - geography: zone_group
- h_kids:
  - description: Households with children.
  - method: prep_target_kids
  - survey_input: num_kids
  - pums_input: AGEP
  - label_var: Children in household
  - levels:
    - 0
    - 1
  - label_levels:
    - '0'
    - 1+
  - geography: zone_group
- h_puma:
  - description: Households by PUMA (tabulate only).
  - method: tabulate
  - survey_input: puma_id
  - pums_input: PUMA
  - label_var: PUMA
  - levels: {}
  - geography: zone_group
- h_adults:
  - description: Number of adults (18+) in household.
  - method: prep_target_adults
  - survey_input: age
  - pums_input: AGEP
  - label_var: Adults in household
  - levels:
    - 0
    - 1
  - label_levels:
    - '0'
    - 1+
  - geography: zone_group
- h_transit_trips:
  - description: Household-level transit trip totals (tabulate survey only).
  - method: tabulate
  - survey_input: transit_trips
  - pums_input: {}
  - label_var: Transit trips (HH)
  - levels: {}
  - geography: zone_group
- p_gender:
  - description: Person gender.
  - method: prep_target_gender
  - survey_input: gender_imputed
  - pums_input: SEX
  - label_var: Gender
  - levels:
    - male
    - female
  - label_levels:
    - Male
    - Female
  - geography: zone_group
- p_age:
  - description: Person age (bins are upper bounds).
  - method: prep_target_age
  - survey_input: age
  - pums_input: AGEP
  - label_var: Age
  - levels:
    - 4
    - 15
    - 17
    - 24
    - 44
    - 64
  - label_levels:
    - 0–4
    - 5–15
    - 16–17
    - 18–24
    - 25–44
    - 45–64
    - 65+
  - geography: zone_group
- p_employment:
  - description: Employment status.
  - method: prep_target_employment
  - survey_input: employment
  - pums_input:
    - ESR
    - WKHP
  - label_var: Employment
  - levels:
    - nonworker
    - part_time
    - full_time
  - label_levels:
    - Non-worker
    - Part-time
    - Full-time
  - geography: zone_group
- p_commutemode:
  - description: Commute mode (primary).
  - method: prep_target_commutemode
  - survey_input:
    - work_mode
    - job_type
  - pums_input: JWTRNS
  - label_var: Commute mode
  - levels:
    - home
    - transit
    - walk
    - bike
    - other
    - none
  - label_levels:
    - Work from home
    - Transit
    - Walk
    - Bike
    - Other
    - None
  - geography: zone_group
- p_univstudent:
  - description: University student status.
  - method: prep_target_univstudent
  - survey_input:
    - student
    - school_type
  - pums_input: SCHG
  - label_var: University student
  - levels:
    - 'no'
    - 'yes'
  - label_levels:
    - 'No'
    - 'Yes'
  - geography: zone_group
- p_edulevel:
  - description: Education attainment (coarse).
  - method: prep_target_edulevel
  - survey_input: education
  - pums_input: SCHL
  - label_var: Education
  - levels:
    - no_college
    - some_college
  - label_levels:
    - No college
    - Some college
  - geography: zone_group
- p_race:
  - description: Race (collapsed categories).
  - method: prep_target_race
  - survey_input: race_imputed
  - pums_input: RAC1P
  - label_var: Race
  - levels:
    - white
    - afam
    - asian_pacific
    - other
  - label_levels:
    - White
    - Black or African American
    - Asian or Pacific Islander
    - Other
  - geography: zone_group
- p_ethnicity:
  - description: Ethnicity (Hispanic / Not Hispanic).
  - method: prep_target_ethnicity
  - survey_input: ethnicity_imputed
  - pums_input: HISP
  - label_var: Ethnicity
  - levels:
    - not_hispanic
    - hispanic
  - label_levels:
    - Not Hispanic
    - Hispanic
  - geography: zone_group
- p_total:
  - description: Total persons (pass-through).
  - method: pass_through
  - survey_input: p_total
  - pums_input: {}
  - label_var: Total persons
  - levels: {}
  - geography: zone_group
- h_total:
  - description: Total households (pass-through).
  - method: pass_through
  - survey_input: h_total
  - pums_input: {}
  - label_var: Total households
  - levels: {}
  - geography: zone_group
- p_gender-ethnicity:
  - description: 'Crosstab: gender by ethnicity.'
  - method: crosstab
  - targets:
    - p_gender
    - p_ethnicity
  - label_var: Gender × Ethnicity
  - geography: zone_group
- p_gender-race:
  - description: 'Crosstab: gender by race.'
  - method: crosstab
  - targets:
    - p_gender
    - p_race
  - label_var: Gender × Race
  - geography: zone_group

target_updates | Target Update Rules

Optional Target update rules for combining groups. Used in weighting scripts.

Called Directly by Functions:

  • build_update_map
  • create_target_update_table
  • plot_weight_fit
  • popsim_make_control_config
  • update_targets

Called Indirectly by Functions:

  • calc_survey_ci
  • calc_target_ci
  • label_targets
  • popsim_make_input_data
  • summarize_pums
  • summarize_survey

Used in Scripts:

  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd

Example:

target_updates:
- p_commutemode:
    levels:
      walk_bike:
      - walk
      - bike
      other_transit:
      - other
      - transit
    groups:
    - North
- h_vehicles:
    levels:
      insuff:
      - none
      - insuff
    groups:
    - North
- p_ethnicity:
    levels:
      other_hispanic:
      - mexican
      - other_hispanic
    groups:
    - Middle Queens
- h_income:
    levels:
      100000_plus:
      - 100000_199999
      - 200000_plus
    groups:
    - Southern Bronx

popsim_setting_updates | PopulationSim Settings Updates

Optional Settings to update in PopulationSim. Typically blank; if set, overrides defaults in inst/populationsim/configs/settings.yaml. Used in popsim scripts.

Called Directly by Functions:

  • popsim_settings_updates

Used in Scripts:

  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd

Example:

popsim_setting_updates:
- min_expansion_factor: 0.143
  max_expansion_factor: 7

popsim_importance | PopulationSim Importance Weights

Optional Importance weights for popsim controls. Used in popsim scripts.

Called Directly by Functions:

  • create_importance_list

Called Indirectly by Functions:

  • popsim_make_control_config

Used in Scripts:

  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd

Example:

popsim_importance:
- run1_initial:
    p_total: 100

popsim_calculate_importance | Calculate Target Importance

Optional If TRUE, automatically calculates the relative importance of each control field for PopulationSim weighting based on the confidence interval of its target value. Controls with wider confidence intervals are given less importance to prevent overweighting noisy controls. If FALSE or not set, default importance values specified in settings.yaml will be used. Used in popsim_calculate_importance to generate the importance_list passed to PopulationSim during initial and day pattern weighting. This helps avoid overweighting targets that are poorly measured (wide CIs in PUMS estimates).

Called Directly by Functions:

  • popsim_calculate_importance

Used in Scripts:

  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd

Default:

popsim_calculate_importance: no

popsim_search_max_exp | PopulationSim Max Expansion Factor Search Range

Optional Max expansion factors for PopulationSim search. Used in PopulationSim scripts.

Used in Scripts:

  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd

Example:

popsim_search_max_exp:
- - 4
  - 5
  - 6
  - 7
  - 8

popsim_search_bounds | PopulationSim Maximum Weight Search Bounds

Optional Bounds for popsim search. Used in popsim scripts.

Used in Scripts:

  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd

Example:

popsim_search_bounds:
- - 4000
  - 6000
  - 8000
  - 10000
  - 20000

popsim_initial_label | PopulationSim Initial Run Label

Optional Label for initial popsim run. Used in popsim scripts.

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

Default:

popsim_initial_label: round1

max_expansion_factor | Maximum Expansion Factor

Optional Maximum allowed ratio of base weight to PopulationSim weight. If set in the project configuration yaml, this will override the default value of 5 (see inst/populationsim/configs/settings.yaml).

Called Directly by Functions:

  • popsim_search

Used in Scripts:

  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd

Default:

max_expansion_factor: 5

min_expansion_factor | Minimum Expansion Factor

Optional Minimum allowed ratio of base weight to PopulationSim weight. If set in the project configuration yaml, this will override the default value of 0.125 (see inst/populationsim/configs/settings.yaml).

Used in Scripts:

  • 08_round-1-weighting.qmd

Default:

min_expansion_factor: 0.125

absolute_upper_bound | Absolute Upper Bound

Optional Maximum allowed value for household weights in PopulationSim weighting runs. If set in the project configuration yaml, this will override the default value of 10000 (see inst/populationsim/configs/settings.yaml).

Called Directly by Functions:

  • popsim_search

Used in Scripts:

  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd

Default:

absolute_upper_bound: 10000

absolute_lower_bound | Absolute Lower Bound

Optional Minimum allowed value for household weights in PopulationSim weighting runs. If set in the project configuration yaml, this will override the default value of 0 (see inst/populationsim/configs/settings.yaml).

Used in Scripts:

  • 08_round-1-weighting.qmd

Default:

absolute_lower_bound: 0

C.6 Round 2: Day-Pattern Adjustment

daypat_weighting | Day Pattern Weighting Toggle (in development)

Optional If TRUE, perform day-pattern weighting adjustments.

Used in Scripts:

  • 10_round-3-weighting.qmd

Default:

daypat_weighting: yes

daypat_adjustments_table | Day Pattern Adjustments Table

Required Table to use for day pattern adjustments. Used in daypat scripts.

Used in Scripts:

  • 09_round-2-weighting.qmd

Allowed Values: ‘tour’, ‘linked_trip’, ‘trip’ Default:

daypat_adjustments_table: trip

daypat_formula_vars | Day Pattern Formula Variables

Required Variables for day pattern formula. Used in daypat scripts.

Used in Scripts:

  • 09_round-2-weighting.qmd

Default:

daypat_formula_vars:
- zero_vehicle
- as.character(income_imputed_label)
- age_under_35
- age_over_65
- is_employed
- is_student
- diary_online
- diary_call
- age_under_35 * diary_online
- age_over_65 * diary_call

popsim_daypat_label | PopulationSim Day Pattern Run Label

Optional Label for day pattern popsim run. Used in popsim scripts.

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

Default:

popsim_daypat_label: round2

C.7 Round 3: Trip-Rate Adjustment

trip_rate_factor_rescaling | Trip Rate Factor Rescaling Toggle

Required If TRUE, allows trip rate factors <1 to be rescaled to 1. Used in trip rate scripts.

Called Directly by Functions:

  • rescale_trip_rate_factors

Default:

trip_rate_factor_rescaling: yes

trip_rate_factor_cap | Trip Rate Factor Cap

Required Cap on trip rate factor. Used in trip rate scripts.

Used in Scripts:

  • 10_round-3-weighting.qmd

Default:

trip_rate_factor_cap: 2

trip_rate_model_lhs | Left-Hand Side (Dependent Variables) of Trip Rate Model

Required Defines trip type groupings for trip rate model. Each object maps a group name to an array of trip types (e.g., hbw, nhbw, hbs, nhbs, hbo, nhbo, loop_trip).

Used in Scripts:

  • 10_round-3-weighting.qmd

Example:

trip_rate_model_lhs:
- - work:
    - hbw
    - nhbw
  - school:
    - hbs
    - nhbs
  - other:
    - hbo
    - nhbo
- - hbm:
    - hbw
    - hbs
  - hbo: []
  - nhbm:
    - nhbw
    - nhbs
  - nhbo: []
- - work:
    - hbw
    - nhbw
  - school:
    - hbs
    - nhbs
  - mandatory:
    - hbw
    - nhbw
    - hbs
    - nhbs
  - other:
    - hbo
    - nhbo

trip_rate_model_vars | Trip Rate Model Variables

Required Defines predictor variables used in trip rate models by trip type. Each item is a singleton object mapping the trip rate dependent variable (e.g., num_work, num_school, num_other) to an array of predictor variable names.

Used in Scripts:

  • 10_round-3-weighting.qmd

Default:

trip_rate_model_vars:
- num_work:
  - diary_binary
  - hh_size
  - income_cat
  - n_kids
  - employment_status
  - wfh
  - age_cat_adults
  - is_student
  - work_loc_varies
  - education_cat
- num_school:
  - diary_binary
  - hh_size
  - income_cat
  - school_cat
  - p_employment
  - p_wfh
  - p_gender
- num_other:
  - diary_binary
  - hh_size
  - income_cat
  - n_kids
  - employment_status
  - wfh
  - age_cat_adults
  - is_student
  - education_cat

weight_synthetic_trips | Toggle Synthetic Trip Weighting

Required If TRUE, weights synthetic trips. Used in trip scripts.

Called Directly by Functions:

  • calc_trip_weights

Used in Scripts:

  • 10_round-3-weighting.qmd

Default:

weight_synthetic_trips: no

C.8 Transit Controls

transit_modes | Transit Modes Included

Optional Required when using a transit target. List of transit modes included in survey tabulation and expansion. Typical values: [‘bus’, ‘rail’, ‘ferry’, ‘commuter_rail’, ‘light_rail’]. Used to filter and aggregate transit trips by mode.

Used in Scripts:

  • 07_survey-data-preparation.qmd

Example:

transit_modes:
- bus
- rail

transit_target_type | Transit Target Trip Type

Optional Type of transit trip target for expansion. ‘linked_trips’ (default) uses linked trips as the main target; ‘boardings’ uses unlinked trips. Determines how transit boardings and trips are interpreted in weighting and reporting.

Called Directly by Functions:

  • prep_transit_target

Called Indirectly by Functions:

  • calc_target_ci

Used in Scripts:

  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd

Allowed Values: ‘linked_trips’, ‘boardings’


transit_target_moe | Transit Target Margin of Error

Optional Margin of error (MOE) for transit target boardings or trips, used in confidence interval calculations for weighting and QA/QC. Typically set as a percentage of total boardings (e.g., 0.05 for 5%). Used in calc_target_ci.

Called Directly by Functions:

  • calc_target_ci

Used in Scripts:

  • 08_round-1-weighting.qmd

Default:

transit_target_moe: 0.05

transit_weekday_boardings | Transit Weekday Boardings

Optional Observed or targeted total transit boardings for a typical weekday across the study region. Used as a control total in expansion and weighting. Must be set from external agency data or survey sources.

Called Directly by Functions:

  • prep_transit_target

Called Indirectly by Functions:

  • calc_target_ci

Used in Scripts:

  • 06_target-data-preparation.qmd
  • 08_round-1-weighting.qmd

transit_weekend_factor | Transit Weekend Adjustment Factor

Optional Factor applied to weekday boardings to estimate weekend ridership (e.g., 0.7 for weekends are 70% of weekday boardings). Used to scale control totals for weekend expansion.

Called Directly by Functions:

  • prep_transit_target

Called Indirectly by Functions:

  • calc_target_ci

Used in Scripts:

  • 06_target-data-preparation.qmd
  • 08_round-1-weighting.qmd

transit_boardings_per_trip | Transit Boardings per Trip

Optional Average number of boardings per transit trip, used to convert between linked and unlinked trip targets. Set from external survey or agency data (e.g., 1.2 boardings per linked trip).

Called Directly by Functions:

  • prep_transit_target

Called Indirectly by Functions:

  • calc_target_ci

Used in Scripts:

  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd

C.9 QA/QC, Reports & Dashboards

plots | Enable Diagnostic Plots

Optional If TRUE, generates diagnostic plots as part of sample plan QA/QC and zone group clustering. Plots are saved to the ‘report/plots/’ directory and include maps of study region block groups, PUMA-to-client zone intersections, and clustering/zone group results. Used in calc_sample_plan_counts and zone_group_plots. Deprecated for more granular control, but still useful for quick checks.

Called Directly by Functions:

  • calc_sample_plan_counts
  • entropy_zone_groups

Called Indirectly by Functions:

  • cluster_pumas
  • prepare_zone_groups

Used in Scripts:

  • 06_target-data-preparation.qmd
  • 08_round-1-weighting.qmd

Default:

plots: no

C.10 Legacy / Deprecated (kept for backwards compatibility)

hts_rmove_version | HTS RMove Data Version Suffix (Deprecated)

Optional Suffix to append to HTS table names for versioning when constructing database table references. This setting is DEPRECATED and replaced by ‘hts_table_map’, which directly maps canonical names to database tables. Only used in legacy workflows with ‘hts_tables_prefix’; not required for current workflows.

Called Directly by Functions:

  • get_db_table_name

Called Indirectly by Functions:

  • fetch_hts_table
  • fix_value_labels_on_load
  • get_income_broad
  • get_max_income_bin
  • get_max_survey_income_bin
  • get_user_specs
  • impute_income_pnta
  • prep_hhs_for_income_imputation
  • prep_initial_expansion_data
  • prepare_zone_groups
  • test_results
  • update_income_broad_labels

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd

hts_tables_prefix | HTS Table Name Prefix (Deprecated)

Optional Prefix to prepend to HTS table names when constructing database table references. This setting is DEPRECATED and replaced by ‘hts_table_map’, which provides direct mapping of canonical table names to specific database tables. Only used for legacy workflows; if set, will be combined with ‘hts_rmove_version’ (if present) to create table names. Not required for current workflows.

Called Directly by Functions:

  • get_db_table_name

Called Indirectly by Functions:

  • fetch_hts_table
  • fix_value_labels_on_load
  • get_income_broad
  • get_max_income_bin
  • get_max_survey_income_bin
  • get_user_specs
  • impute_income_pnta
  • prep_hhs_for_income_imputation
  • prep_initial_expansion_data
  • prepare_zone_groups
  • test_results
  • update_income_broad_labels

Used in Scripts:

  • 05_setup-data-geographies.qmd
  • 06_target-data-preparation.qmd
  • 07_survey-data-preparation.qmd
  • 08_round-1-weighting.qmd
  • 09_round-2-weighting.qmd
  • 10_round-3-weighting.qmd