Appendix B — Configuration Schema Documentation
C Configuration Reference
C.1 Project & I/O
dbname | Database Name
Optional Database name for input/output tables. Used in DB connection scripts.
Called Directly by Functions:
- write_to_db
Example:
dbname:
- psrc
- hts_weighting_testingschema | Database Schema
Optional Database schema for input/output tables. Used in DB connection scripts.
Called Directly by Functions:
- write_to_db
Example:
schema:
- hts_2025_y5
- nyc_cms_hts_2024run_scripts | Scripts to Run
Optional (RSG specific): List of scripts to run in the pipeline. Each item must be the name of a script in the scripts/ directory. Default:
run_scripts:
- 001_input_checker.R
- 005_create_crosswalk.R
- 020_control_data_cleaning.R
- 022_control_data_tabulation.R
- 023_control_data_sum_by_zones.R
- 030_survey_data_cleaning.R
- 031_survey_data_imputation.R
- 032_survey_data_tabulation.R
- 040_initial_expansion.R
- 050_initial_weighting.R
- 060_daypat_adjustments.R
- 070_daypat_weighting.R
- 080_person_day_trip_weights.R
- 085_trip_weight_adjustment.R
- 090_write_to_db.R
- 100_weight_checks.Rmd
- 105_weighting_memo.Rmdhts_table_map | Input Table Mapping
Required Mapping of canonical table names to database tables. Used in all ETL scripts.
Called Directly by Functions:
- get_db_table_name
Called Indirectly by Functions:
- fetch_hts_table
- fix_value_labels_on_load
- get_income_broad
- get_max_income_bin
- get_max_survey_income_bin
- get_user_specs
- impute_income_pnta
- prep_hhs_for_income_imputation
- prep_initial_expansion_data
- prepare_zone_groups
- test_results
- update_income_broad_labels
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd
Default:
hts_table_map:
household: toc_hh
person: toc_person
day: toc_day
trip: toc_trip
linked_trip: toc_linked_trip
tour: toc_tourweight_output_map | Output Table Map
Required Mapping of canonical output names to output tables. Used in output/DB writing scripts.
Called Directly by Functions:
- test_results
Used in Scripts:
- 10_round-3-weighting.qmd
Default:
weight_output_map:
household: ex_weights_hh
person: ex_weights_person
day: ex_weights_day
trip: ex_weights_trip
linked_trip: ex_weights_linked_trip
tour: ex_weights_tourwrite_to_db | Write Weights to Database
Optional If TRUE, writes weights to the database. Used in output scripts.
Called Directly by Functions:
- check_write_to_db
Called Indirectly by Functions:
- get_settings
- get_test_settings
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd
Default:
write_to_db: noraw_data_root | raw_data_root
Optional Location of RSG-provided raw data; typically specified in the .Renviron file as RAW_DATA_PATH. Used as the base path for reading raw survey and HTS data files.
Called Directly by Functions:
- fetch_hts_table
- fetch_study_region
- get_settings
Called Indirectly by Functions:
- add_geometry_to_table
- adjust_ref_counts_dataset
- calc_sample_plan_counts
- cluster_pumas
- create_ie_adjustment_data
- fetch_acs
- fetch_pums
- fix_value_labels_on_load
- get_acs_bg_counts
- get_acs_bg_counts_base
- get_acs_ethnicity
- get_acs_race
- get_bg_geom
- get_county_fips
- get_income_broad
- get_max_income_bin
- get_max_survey_income_bin
- get_puma_geom
- get_puma_ids
- get_pumas
- get_state_fips
- get_test_settings
- get_tracts_puma_xwalk
- get_user_specs
- impute_ethnicity
- impute_income_pnta
- impute_race
- load_sf_obj
- prep_hhs_for_income_imputation
- prep_initial_expansion_data
- prep_zones_sf
- prepare_acs_income
- prepare_zone_groups
- sampled_latlon_to_bg
- test_results
- update_income_broad_labels
- zone_group_plots
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd
C.2 Reference Data (ACS/PUMS & extras)
acs_year | ACS Data Year
Required ACS year for income imputation and block group matching. Used in crosswalk and cleaning scripts.
Called Directly by Functions:
- create_ie_adjustment_data
- fetch_acs
- get_acs_ethnicity
- get_acs_race
- get_bg_geom
- get_county_fips
- get_max_acs_income_bin
- get_state_fips
- impute_ethnicity
- impute_gender
- impute_race
- prepare_acs_income
Called Indirectly by Functions:
- add_geometry_to_table
- adjust_ref_counts_dataset
- calc_sample_plan_counts
- cluster_pumas
- fetch_hts_table
- fetch_pums
- fix_value_labels_on_load
- get_acs_bg_counts
- get_acs_bg_counts_base
- get_income_broad
- get_max_income_bin
- get_max_survey_income_bin
- get_puma_geom
- get_puma_ids
- get_pumas
- get_tracts_puma_xwalk
- get_user_specs
- impute_income_pnta
- load_sf_obj
- prep_hhs_for_income_imputation
- prep_initial_expansion_data
- prep_zones_sf
- prepare_zone_groups
- sampled_latlon_to_bg
- test_results
- update_income_broad_labels
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd
Default:
acs_year: 2023acs_dataset | ACS Dataset Type
Required ACS survey time span, matching the ‘survey’ argument in tidycensus::get_acs().
Called Directly by Functions:
- adjust_ref_counts_dataset
- calc_sample_plan_counts
- create_ie_adjustment_data
- fetch_acs
- get_acs_bg_counts_base
- get_max_acs_income_bin
Called Indirectly by Functions:
- fetch_hts_table
- fix_value_labels_on_load
- get_acs_bg_counts
- get_acs_ethnicity
- get_acs_race
- get_income_broad
- get_max_income_bin
- get_max_survey_income_bin
- get_user_specs
- impute_ethnicity
- impute_income_pnta
- impute_race
- prep_hhs_for_income_imputation
- prep_initial_expansion_data
- prepare_acs_income
- prepare_zone_groups
- test_results
- update_income_broad_labels
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd
Allowed Values: ‘acs1’, ‘acs5’ Default:
acs_dataset: acs5acs_tables | ACS Table Codes Mapping
Required List of ACS table codes by logical name. Each entry is a mapping of one logical name to one table code.
Called Directly by Functions:
- get_acs_race
- get_max_acs_income_bin
- prepare_acs_income
Called Indirectly by Functions:
- fetch_hts_table
- fix_value_labels_on_load
- get_income_broad
- get_max_income_bin
- get_max_survey_income_bin
- get_user_specs
- impute_income_pnta
- impute_race
- prep_hhs_for_income_imputation
- prep_initial_expansion_data
- prepare_zone_groups
- test_results
- update_income_broad_labels
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd
Default:
acs_tables:
- sex_by_age: B01001
- hhtype: B11002
- tenure_hh: B25003_001
- occ_hh: B25002_002
- hh_income: B19001
- race: B02001acs_count_vars | ACS Count Variable Codes
Required Mappings of ACS count variable names to codes. Each entry is a single-key object.
Called Directly by Functions:
- adjust_ref_counts_dataset
- get_acs_bg_counts_base
Called Indirectly by Functions:
- calc_sample_plan_counts
- get_acs_bg_counts
Used in Scripts:
- 05_setup-data-geographies.qmd
Default:
acs_count_vars:
- household: B25003_001
- person: B11002_001state_fips | State FIPS Code
Optional Federal Information Processing Standard (FIPS) code for the state in the study region. Used to select and filter ACS and PUMS data for geographic subsetting, clustering, and reporting. This is typically blank and discovered automatically from existing data in fetch_acs.R and fetch_pums.R, but can be set here to override that behavior.
Called Directly by Functions:
- cluster_pumas
- fetch_acs
- fetch_pums
Called Indirectly by Functions:
- calc_sample_plan_counts
- create_ie_adjustment_data
- get_acs_bg_counts
- get_acs_bg_counts_base
- get_acs_ethnicity
- get_acs_race
- impute_ethnicity
- impute_income_pnta
- impute_race
- prepare_acs_income
- prepare_zone_groups
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
county_fips | County FIPS Code
Optional Federal Information Processing Standard (FIPS) code(s) for counties in the study region. Used to select and filter ACS and PUMS data for geographic subsetting. This is typically blank and discovered from existing data in fetch_acs.R and fetch_pums.R, but can be set here to override that behavior.
Called Directly by Functions:
- fetch_acs
Called Indirectly by Functions:
- calc_sample_plan_counts
- get_acs_bg_counts
- get_acs_bg_counts_base
- get_acs_ethnicity
- get_acs_race
- impute_ethnicity
- impute_income_pnta
- impute_race
- prepare_acs_income
Used in Scripts:
- 05_setup-data-geographies.qmd
- 07_survey-data-preparation.qmd
pums_year | PUMS Data Year
Required PUMS year for target data. Used in crosswalk and cleaning scripts.
Called Directly by Functions:
- adjust_ref_counts_dataset
- calc_sample_plan_counts
- cluster_pumas
- create_ie_adjustment_data
- fetch_pums
- get_acs_bg_counts_base
- get_puma_geom
- get_puma_ids
- get_pumas
- get_tracts_puma_xwalk
- read_pums_codebook
- sampled_latlon_to_bg
Called Indirectly by Functions:
- add_geometry_to_table
- append_var_lab
- get_acs_bg_counts
- get_settings
- get_test_settings
- impute_income_nonrelatives
- load_sf_obj
- prep_zones_sf
- prepare_income_fit_dt
- prepare_zone_groups
- update_settings_pums_vars
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd
Default:
pums_year: 2023pums_dataset | PUMS Dataset Type
Required PUMS dataset type. Used in crosswalk and cleaning scripts.
Called Directly by Functions:
- adjust_ref_counts_dataset
- fetch_pums
- read_pums_codebook
Called Indirectly by Functions:
- append_var_lab
- calc_sample_plan_counts
- create_ie_adjustment_data
- get_settings
- get_test_settings
- impute_income_nonrelatives
- prepare_income_fit_dt
- update_settings_pums_vars
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd
Allowed Values: ‘acs1’, ‘acs5’ Default:
pums_dataset: acs1nontarget_vars | Additional PUMS Variables
Required Additional PUMS columns to fetch. Used in fetch_pums scripts.
Called Directly by Functions:
- update_settings_pums_vars
Called Indirectly by Functions:
- get_settings
- get_test_settings
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd
Example:
nontarget_vars:
- - PINCP
- RELSHIPPmax_income_bin | Maximum Income Bin
Required Maximum income bin for top-coding. Used in income alignment scripts.
Called Directly by Functions:
- get_max_income_bin
Called Indirectly by Functions:
- fetch_hts_table
- fix_value_labels_on_load
- get_income_broad
- get_max_survey_income_bin
- get_user_specs
- impute_income_pnta
- prep_hhs_for_income_imputation
- prep_initial_expansion_data
- prepare_zone_groups
- test_results
- update_income_broad_labels
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd
Default:
max_income_bin: 200000force_balance_hh_weights | Force Balanced Household Weights
Required If TRUE, PUMS household weights are recalculated from person weights. Corrects for discrepancy between sum(PWGPT) and sum(WGTP) in PUMS data.
Called Directly by Functions:
- adjust_target_to_study_zones
- summarize_pums
Called Indirectly by Functions:
- calc_target_ci
Used in Scripts:
- 06_target-data-preparation.qmd
- 08_round-1-weighting.qmd
Default:
force_balance_hh_weights: yesage_employable | Employable Age Threshold
Optional Minimum age considered employable for income imputation and workforce analysis. Used in imputation and preparation scripts to filter or categorize persons.
Called Directly by Functions:
- impute_income_nonrelatives
- prepare_income_fit_dt
- prepare_persons_dt
Used in Scripts:
- 07_survey-data-preparation.qmd
Default:
age_employable: 16C.3 Geography & Zones
zone_type | Weighting Geographic Zone Type
Required Type of geographic zones used in the analysis. Used in crosswalk and tabulation scripts.
Called Directly by Functions:
- add_geometry_to_table
- adjust_target_to_study_zones
- get_user_specs
- group_from_defined_list
Called Indirectly by Functions:
- prepare_zone_groups
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
Allowed Values: ‘client_zones’, ‘pumas’ Default:
zone_type: pumaszone_groups | Groups of zones for aggregation
Required Groups of zones for aggregation. Used in crosswalk and tabulation scripts.
Called Directly by Functions:
- get_user_specs
Called Indirectly by Functions:
- prepare_zone_groups
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
Example:
zone_groups:
- North:
- '5321101'
- South:
- '5321104'
- '5321102'
- '5321103'xwalk_sliver_threshold | Crosswalk Geometry Sliver Threshold
Required Threshold for geometry slivers in crosswalk creation. Used in crosswalk scripts.
Used in Scripts:
- 05_setup-data-geographies.qmd
Default:
xwalk_sliver_threshold: 0.05puma_buffer | PUMA Buffer Distance (meters)
Required Buffer distance (meters) for PUMA region selection. Used in crosswalk scripts.
Called Directly by Functions:
- get_puma_geom
- get_puma_ids
Called Indirectly by Functions:
- add_geometry_to_table
- cluster_pumas
- create_ie_adjustment_data
- fetch_pums
- get_tracts_puma_xwalk
- load_sf_obj
- prep_zones_sf
- prepare_zone_groups
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
Default:
puma_buffer: 100puma_clustering | PUMA Clustering Method
Optional Specifies the method used to cluster PUMAs into zone groups for weighting and reporting. Typical values include ‘kmeans’, ‘spectral’, and ‘entropy’. The clustering method controls how PUMAs are grouped for assignment to client zones and determines the approach for spatial aggregation in the weighting process. Used in entropy_zone_groups, spectral_zone_groups, and related clustering functions.
Called Directly by Functions:
- entropy_zone_groups
- get_user_specs
- spectral_zone_groups
Called Indirectly by Functions:
- cluster_pumas
- prepare_zone_groups
Used in Scripts:
- 06_target-data-preparation.qmd
- 08_round-1-weighting.qmd
Allowed Values: ‘kmeans’, ‘spectral’, ‘entropy’ Default:
puma_clustering: kmeansgeographies | Geography Definitions
Optional Custom definitions for geographic groupings or crosswalks, such as aggregations of zones, PUMAs, or client areas. Used to configure zone-level weighting and reporting. If set here, will override the definitions in inst/populationsim/configs/settings.yaml.
Called Directly by Functions:
- popsim_settings_updates
Used in Scripts:
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
Default:
geographies:
- region
- zone_grouplargest_bg_to_client_zone | Largest Block Group to Client Zone Mapping
Optional If TRUE, assigns each Census block group to the client zone with which it has the largest geographic overlap (rather than splitting proportions). Used in crosswalk generation to simplify mapping and ensure each block group is assigned to a single client zone. Typically used in 005_create_crosswalk.R.
Used in Scripts:
- 05_setup-data-geographies.qmd
Default:
largest_bg_to_client_zone: nouse_reported_home | Use Reported Home Location for Segment Assignment
Required If TRUE, uses reported home location for segment assignment. Used in weighting scripts.
Used in Scripts:
- 07_survey-data-preparation.qmd
Default:
use_reported_home: yesC.4 Reproducibility & Study Frame
rng_seed | Random Number Seed
Required Random number seed for reproducibility. Used in all scripts.
Called Directly by Functions:
- adjust_unrelated_pums
- cluster_pumas
- get_user_specs
- impute_ethnicity
- impute_gender
- impute_race
Called Indirectly by Functions:
- prepare_zone_groups
Used in Scripts:
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
Default:
rng_seed: 4119study_unit | Study Unit for Weighting
Required Unit of analysis for weighting. Used in all weighting scripts.
Called Directly by Functions:
- adjust_pums_to_reference
- adjust_ref_counts_dataset
- adjust_reference_to_target
- adjust_target_to_study_zones
- calc_initial_weights
- calc_person_weights
- calc_sample_plan_counts
- check_diff
- check_initial_weights
- create_ie_adjustment_data
- get_acs_bg_counts_base
- get_settings
- prep_initial_expansion_data
- summarize_pums
Called Indirectly by Functions:
- calc_target_ci
- get_acs_bg_counts
- get_test_settings
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd
Allowed Values: ‘household’, ‘person’ Default:
study_unit: householdweight_dow_groups | Day-of-Week Weight Groups
Required Defines how days of the week are grouped for survey weighting and reporting. Each object maps a group label to an array of day indices (Monday=1, …, Sunday=7). Used to aggregate and apply weights to specified day groups.
Called Directly by Functions:
- adjust_reference_to_target
- calc_day_weights
- calc_target_ci
- check_initial_weights
- create_importance_list
- get_day_groups
- label_targets
- plot_weight_fit
- popsim_make_control_config
- prepare_targets
Called Indirectly by Functions:
- calc_complete_hhdays
- calc_weight_fit
- impute_income_nonrelatives
- prep_transit_target
- prepare_impute_targets
- prepare_income_fit_dt
- prepare_persons_dt
Used in Scripts:
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd
Default:
weight_dow_groups:
avg_weekday:
- 2
- 3
- 4nps_segments | Non-Probability Sample Segments
Optional Specifies the list of non-probability sample (NPS) segments included in the survey. These are supplemental or convenience sample groups that do not follow the main ABS (address-based sample) design. When defined, these segments are matched by label to the sample_segment variable in the survey data, and are assigned an ‘NPS’ invitation type in downstream cleaning scripts. All households in these segments can be lumped together as a single supplemental group for weighting and reporting, or further blended with ABS samples using the nps_blending_factor. If omitted or empty, all invitations default to ‘ABS’. Note: If NPS segments are used, additional blending logic may be required in weighting and reporting scripts.
Used in Scripts:
- 07_survey-data-preparation.qmd
Example:
nps_segments:
- Supplemental
- University
- Outreachnps_blending_factor | Non-Probability Sample Blending Factor
Optional Sets the relative weight or adjustment factor for households in non-probability sample (NPS) segments when blending with the main ABS (address-based sample) in survey weighting and reporting.
If specified, this factor determines how NPS households are upweighted or downweighted relative to ABS households, ensuring that the combined sample reflects the intended proportions. If not defined, the blending factor is automatically calculated based on the observed share of NPS households in the sample.
Use this field to override the default blending and explicitly control the contribution of NPS segments in expansion and weighting. Typical values are numeric (e.g., 0.5 to give NPS segments half the weight of ABS segments, or 1.0 for equal weighting).
Note: When NPS segments are present, careful adjustment is required to avoid biasing regional estimates. See weighting scripts for details on how this blending is applied.
Called Directly by Functions:
- calc_alpha
Called Indirectly by Functions:
- calc_initial_weights
- prep_initial_expansion_data
Used in Scripts:
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
C.5 Targets & PopulationSim
input_table_list | Input Table List
Optional List of input tables to load for PopulationSim and weighting steps. Each item should specify the canonical table name, source filename, and optionally the index column. If set here, will override the default list in inst/populationsim/configs/settings.yaml.
Called Directly by Functions:
- popsim_settings_updates
Used in Scripts:
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
Default:
input_table_list:
- tablename: households
filename: seed_households.csv
index_col: hh_id
- tablename: persons
filename: seed_households.csv
- tablename: geo_cross_walk
filename: geo_cross_walk.csv
- tablename: zone_group_control_data
filename: control_totals_zone_group.csv
- tablename: region_control_data
filename: control_totals_region.csvtargets | Target Variables and Definitions
Required Each array item is a singleton object mapping the target name (e.g., p_gender, h_size) to an array of singleton key/value objects. This matches the YAML form:
- h_size: - method: “prep_target_h_size” - survey_input: num_people - pums_input: NP - label_var: “Household size” - levels: [1,2,3,4,5] - label_levels: [“1”,“2”,“3”,“4”,“5+”] - geography: zone_group
Called Directly by Functions:
- calc_survey_ci
- calc_target_ci
- check_group_sums
- create_target_update_table
- entropy_zone_groups
- get_acs_race
- get_target_map
- impute_ethnicity
- impute_race
- label_targets
- plot_weight_fit
- popsim_make_control_config
- popsim_settings_updates
- prep_control_tables
- prep_target_adults
- prep_target_age
- prep_target_commutemode
- prep_target_cross
- prep_target_edulevel
- prep_target_employment
- prep_target_ethnicity
- prep_target_gender
- prep_target_h_size
- prep_target_income
- prep_target_kids
- prep_target_race
- prep_target_univstudent
- prep_target_vehicles
- prep_target_workers
- prep_transit_target
- prepare_targets
- update_settings_pums_vars
Called Indirectly by Functions:
- cluster_pumas
- get_settings
- get_test_settings
- impute_income_nonrelatives
- popsim_make_input_data
- prepare_impute_targets
- prepare_income_fit_dt
- prepare_persons_dt
- prepare_zone_groups
- summarize_pums
- summarize_survey
- update_targets
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd
Default:
targets:
- h_size:
- description: Household size (persons).
- method: prep_target_h_size
- survey_input: num_people
- pums_input: NP
- label_var: Household size
- levels:
- 1
- 2
- 3
- 4
- 5
- label_levels:
- '1'
- '2'
- '3'
- '4'
- 5+
- geography: zone_group
- h_income:
- description: Total household income (bins are upper bounds).
- method: prep_target_income
- survey_input: income_imputed_value
- pums_input: HINCP
- label_var: Household income
- levels:
- 24999
- 49999
- 74999
- 99999
- 199999
- label_levels:
- $0-$24,999
- $25,000-$49,999
- $50,000-$74,999
- $75,000-$99,999
- $100,000-$199,999
- $200,000+
- geography: zone_group
- h_workers:
- description: Number of employed adults in household.
- method: prep_target_workers
- survey_input:
- age
- employment
- pums_input: ESR
- label_var: Number of workers
- levels:
- 0
- 1
- 2
- label_levels:
- '0'
- '1'
- 2+
- geography: zone_group
- h_vehicles:
- description: Vehicle sufficiency (none/insuff/suff).
- method: prep_target_vehicles
- survey_input:
- can_drive
- num_vehicles
- pums_input:
- VEH
- AGEP
- label_var: Vehicle sufficiency
- levels:
- none
- insuff
- suff
- label_levels:
- None
- Insufficient
- Sufficient
- geography: zone_group
- h_kids:
- description: Households with children.
- method: prep_target_kids
- survey_input: num_kids
- pums_input: AGEP
- label_var: Children in household
- levels:
- 0
- 1
- label_levels:
- '0'
- 1+
- geography: zone_group
- h_puma:
- description: Households by PUMA (tabulate only).
- method: tabulate
- survey_input: puma_id
- pums_input: PUMA
- label_var: PUMA
- levels: {}
- geography: zone_group
- h_adults:
- description: Number of adults (18+) in household.
- method: prep_target_adults
- survey_input: age
- pums_input: AGEP
- label_var: Adults in household
- levels:
- 0
- 1
- label_levels:
- '0'
- 1+
- geography: zone_group
- h_transit_trips:
- description: Household-level transit trip totals (tabulate survey only).
- method: tabulate
- survey_input: transit_trips
- pums_input: {}
- label_var: Transit trips (HH)
- levels: {}
- geography: zone_group
- p_gender:
- description: Person gender.
- method: prep_target_gender
- survey_input: gender_imputed
- pums_input: SEX
- label_var: Gender
- levels:
- male
- female
- label_levels:
- Male
- Female
- geography: zone_group
- p_age:
- description: Person age (bins are upper bounds).
- method: prep_target_age
- survey_input: age
- pums_input: AGEP
- label_var: Age
- levels:
- 4
- 15
- 17
- 24
- 44
- 64
- label_levels:
- 0–4
- 5–15
- 16–17
- 18–24
- 25–44
- 45–64
- 65+
- geography: zone_group
- p_employment:
- description: Employment status.
- method: prep_target_employment
- survey_input: employment
- pums_input:
- ESR
- WKHP
- label_var: Employment
- levels:
- nonworker
- part_time
- full_time
- label_levels:
- Non-worker
- Part-time
- Full-time
- geography: zone_group
- p_commutemode:
- description: Commute mode (primary).
- method: prep_target_commutemode
- survey_input:
- work_mode
- job_type
- pums_input: JWTRNS
- label_var: Commute mode
- levels:
- home
- transit
- walk
- bike
- other
- none
- label_levels:
- Work from home
- Transit
- Walk
- Bike
- Other
- None
- geography: zone_group
- p_univstudent:
- description: University student status.
- method: prep_target_univstudent
- survey_input:
- student
- school_type
- pums_input: SCHG
- label_var: University student
- levels:
- 'no'
- 'yes'
- label_levels:
- 'No'
- 'Yes'
- geography: zone_group
- p_edulevel:
- description: Education attainment (coarse).
- method: prep_target_edulevel
- survey_input: education
- pums_input: SCHL
- label_var: Education
- levels:
- no_college
- some_college
- label_levels:
- No college
- Some college
- geography: zone_group
- p_race:
- description: Race (collapsed categories).
- method: prep_target_race
- survey_input: race_imputed
- pums_input: RAC1P
- label_var: Race
- levels:
- white
- afam
- asian_pacific
- other
- label_levels:
- White
- Black or African American
- Asian or Pacific Islander
- Other
- geography: zone_group
- p_ethnicity:
- description: Ethnicity (Hispanic / Not Hispanic).
- method: prep_target_ethnicity
- survey_input: ethnicity_imputed
- pums_input: HISP
- label_var: Ethnicity
- levels:
- not_hispanic
- hispanic
- label_levels:
- Not Hispanic
- Hispanic
- geography: zone_group
- p_total:
- description: Total persons (pass-through).
- method: pass_through
- survey_input: p_total
- pums_input: {}
- label_var: Total persons
- levels: {}
- geography: zone_group
- h_total:
- description: Total households (pass-through).
- method: pass_through
- survey_input: h_total
- pums_input: {}
- label_var: Total households
- levels: {}
- geography: zone_group
- p_gender-ethnicity:
- description: 'Crosstab: gender by ethnicity.'
- method: crosstab
- targets:
- p_gender
- p_ethnicity
- label_var: Gender × Ethnicity
- geography: zone_group
- p_gender-race:
- description: 'Crosstab: gender by race.'
- method: crosstab
- targets:
- p_gender
- p_race
- label_var: Gender × Race
- geography: zone_grouptarget_updates | Target Update Rules
Optional Target update rules for combining groups. Used in weighting scripts.
Called Directly by Functions:
- build_update_map
- create_target_update_table
- plot_weight_fit
- popsim_make_control_config
- update_targets
Called Indirectly by Functions:
- calc_survey_ci
- calc_target_ci
- label_targets
- popsim_make_input_data
- summarize_pums
- summarize_survey
Used in Scripts:
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
Example:
target_updates:
- p_commutemode:
levels:
walk_bike:
- walk
- bike
other_transit:
- other
- transit
groups:
- North
- h_vehicles:
levels:
insuff:
- none
- insuff
groups:
- North
- p_ethnicity:
levels:
other_hispanic:
- mexican
- other_hispanic
groups:
- Middle Queens
- h_income:
levels:
100000_plus:
- 100000_199999
- 200000_plus
groups:
- Southern Bronxpopsim_setting_updates | PopulationSim Settings Updates
Optional Settings to update in PopulationSim. Typically blank; if set, overrides defaults in inst/populationsim/configs/settings.yaml. Used in popsim scripts.
Called Directly by Functions:
- popsim_settings_updates
Used in Scripts:
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
Example:
popsim_setting_updates:
- min_expansion_factor: 0.143
max_expansion_factor: 7popsim_importance | PopulationSim Importance Weights
Optional Importance weights for popsim controls. Used in popsim scripts.
Called Directly by Functions:
- create_importance_list
Called Indirectly by Functions:
- popsim_make_control_config
Used in Scripts:
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
Example:
popsim_importance:
- run1_initial:
p_total: 100popsim_calculate_importance | Calculate Target Importance
Optional If TRUE, automatically calculates the relative importance of each control field for PopulationSim weighting based on the confidence interval of its target value. Controls with wider confidence intervals are given less importance to prevent overweighting noisy controls. If FALSE or not set, default importance values specified in settings.yaml will be used. Used in popsim_calculate_importance to generate the importance_list passed to PopulationSim during initial and day pattern weighting. This helps avoid overweighting targets that are poorly measured (wide CIs in PUMS estimates).
Called Directly by Functions:
- popsim_calculate_importance
Used in Scripts:
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
Default:
popsim_calculate_importance: nopopsim_search_max_exp | PopulationSim Max Expansion Factor Search Range
Optional Max expansion factors for PopulationSim search. Used in PopulationSim scripts.
Used in Scripts:
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
Example:
popsim_search_max_exp:
- - 4
- 5
- 6
- 7
- 8popsim_search_bounds | PopulationSim Maximum Weight Search Bounds
Optional Bounds for popsim search. Used in popsim scripts.
Used in Scripts:
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
Example:
popsim_search_bounds:
- - 4000
- 6000
- 8000
- 10000
- 20000popsim_initial_label | PopulationSim Initial Run Label
Optional Label for initial popsim run. Used in popsim scripts.
Used in Scripts:
- 05_setup-data-geographies.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd
Default:
popsim_initial_label: round1max_expansion_factor | Maximum Expansion Factor
Optional Maximum allowed ratio of base weight to PopulationSim weight. If set in the project configuration yaml, this will override the default value of 5 (see inst/populationsim/configs/settings.yaml).
Called Directly by Functions:
- popsim_search
Used in Scripts:
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
Default:
max_expansion_factor: 5min_expansion_factor | Minimum Expansion Factor
Optional Minimum allowed ratio of base weight to PopulationSim weight. If set in the project configuration yaml, this will override the default value of 0.125 (see inst/populationsim/configs/settings.yaml).
Used in Scripts:
- 08_round-1-weighting.qmd
Default:
min_expansion_factor: 0.125absolute_upper_bound | Absolute Upper Bound
Optional Maximum allowed value for household weights in PopulationSim weighting runs. If set in the project configuration yaml, this will override the default value of 10000 (see inst/populationsim/configs/settings.yaml).
Called Directly by Functions:
- popsim_search
Used in Scripts:
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
Default:
absolute_upper_bound: 10000absolute_lower_bound | Absolute Lower Bound
Optional Minimum allowed value for household weights in PopulationSim weighting runs. If set in the project configuration yaml, this will override the default value of 0 (see inst/populationsim/configs/settings.yaml).
Used in Scripts:
- 08_round-1-weighting.qmd
Default:
absolute_lower_bound: 0C.6 Round 2: Day-Pattern Adjustment
daypat_weighting | Day Pattern Weighting Toggle (in development)
Optional If TRUE, perform day-pattern weighting adjustments.
Used in Scripts:
- 10_round-3-weighting.qmd
Default:
daypat_weighting: yesdaypat_adjustments_table | Day Pattern Adjustments Table
Required Table to use for day pattern adjustments. Used in daypat scripts.
Used in Scripts:
- 09_round-2-weighting.qmd
Allowed Values: ‘tour’, ‘linked_trip’, ‘trip’ Default:
daypat_adjustments_table: tripdaypat_formula_vars | Day Pattern Formula Variables
Required Variables for day pattern formula. Used in daypat scripts.
Used in Scripts:
- 09_round-2-weighting.qmd
Default:
daypat_formula_vars:
- zero_vehicle
- as.character(income_imputed_label)
- age_under_35
- age_over_65
- is_employed
- is_student
- diary_online
- diary_call
- age_under_35 * diary_online
- age_over_65 * diary_callpopsim_daypat_label | PopulationSim Day Pattern Run Label
Optional Label for day pattern popsim run. Used in popsim scripts.
Used in Scripts:
- 05_setup-data-geographies.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd
Default:
popsim_daypat_label: round2C.7 Round 3: Trip-Rate Adjustment
trip_rate_factor_rescaling | Trip Rate Factor Rescaling Toggle
Required If TRUE, allows trip rate factors <1 to be rescaled to 1. Used in trip rate scripts.
Called Directly by Functions:
- rescale_trip_rate_factors
Default:
trip_rate_factor_rescaling: yestrip_rate_factor_cap | Trip Rate Factor Cap
Required Cap on trip rate factor. Used in trip rate scripts.
Used in Scripts:
- 10_round-3-weighting.qmd
Default:
trip_rate_factor_cap: 2trip_rate_model_lhs | Left-Hand Side (Dependent Variables) of Trip Rate Model
Required Defines trip type groupings for trip rate model. Each object maps a group name to an array of trip types (e.g., hbw, nhbw, hbs, nhbs, hbo, nhbo, loop_trip).
Used in Scripts:
- 10_round-3-weighting.qmd
Example:
trip_rate_model_lhs:
- - work:
- hbw
- nhbw
- school:
- hbs
- nhbs
- other:
- hbo
- nhbo
- - hbm:
- hbw
- hbs
- hbo: []
- nhbm:
- nhbw
- nhbs
- nhbo: []
- - work:
- hbw
- nhbw
- school:
- hbs
- nhbs
- mandatory:
- hbw
- nhbw
- hbs
- nhbs
- other:
- hbo
- nhbotrip_rate_model_vars | Trip Rate Model Variables
Required Defines predictor variables used in trip rate models by trip type. Each item is a singleton object mapping the trip rate dependent variable (e.g., num_work, num_school, num_other) to an array of predictor variable names.
Used in Scripts:
- 10_round-3-weighting.qmd
Default:
trip_rate_model_vars:
- num_work:
- diary_binary
- hh_size
- income_cat
- n_kids
- employment_status
- wfh
- age_cat_adults
- is_student
- work_loc_varies
- education_cat
- num_school:
- diary_binary
- hh_size
- income_cat
- school_cat
- p_employment
- p_wfh
- p_gender
- num_other:
- diary_binary
- hh_size
- income_cat
- n_kids
- employment_status
- wfh
- age_cat_adults
- is_student
- education_catweight_synthetic_trips | Toggle Synthetic Trip Weighting
Required If TRUE, weights synthetic trips. Used in trip scripts.
Called Directly by Functions:
- calc_trip_weights
Used in Scripts:
- 10_round-3-weighting.qmd
Default:
weight_synthetic_trips: noC.8 Transit Controls
transit_modes | Transit Modes Included
Optional Required when using a transit target. List of transit modes included in survey tabulation and expansion. Typical values: [‘bus’, ‘rail’, ‘ferry’, ‘commuter_rail’, ‘light_rail’]. Used to filter and aggregate transit trips by mode.
Used in Scripts:
- 07_survey-data-preparation.qmd
Example:
transit_modes:
- bus
- railtransit_target_type | Transit Target Trip Type
Optional Type of transit trip target for expansion. ‘linked_trips’ (default) uses linked trips as the main target; ‘boardings’ uses unlinked trips. Determines how transit boardings and trips are interpreted in weighting and reporting.
Called Directly by Functions:
- prep_transit_target
Called Indirectly by Functions:
- calc_target_ci
Used in Scripts:
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
Allowed Values: ‘linked_trips’, ‘boardings’
transit_target_moe | Transit Target Margin of Error
Optional Margin of error (MOE) for transit target boardings or trips, used in confidence interval calculations for weighting and QA/QC. Typically set as a percentage of total boardings (e.g., 0.05 for 5%). Used in calc_target_ci.
Called Directly by Functions:
- calc_target_ci
Used in Scripts:
- 08_round-1-weighting.qmd
Default:
transit_target_moe: 0.05transit_weekday_boardings | Transit Weekday Boardings
Optional Observed or targeted total transit boardings for a typical weekday across the study region. Used as a control total in expansion and weighting. Must be set from external agency data or survey sources.
Called Directly by Functions:
- prep_transit_target
Called Indirectly by Functions:
- calc_target_ci
Used in Scripts:
- 06_target-data-preparation.qmd
- 08_round-1-weighting.qmd
transit_weekend_factor | Transit Weekend Adjustment Factor
Optional Factor applied to weekday boardings to estimate weekend ridership (e.g., 0.7 for weekends are 70% of weekday boardings). Used to scale control totals for weekend expansion.
Called Directly by Functions:
- prep_transit_target
Called Indirectly by Functions:
- calc_target_ci
Used in Scripts:
- 06_target-data-preparation.qmd
- 08_round-1-weighting.qmd
transit_boardings_per_trip | Transit Boardings per Trip
Optional Average number of boardings per transit trip, used to convert between linked and unlinked trip targets. Set from external survey or agency data (e.g., 1.2 boardings per linked trip).
Called Directly by Functions:
- prep_transit_target
Called Indirectly by Functions:
- calc_target_ci
Used in Scripts:
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
C.9 QA/QC, Reports & Dashboards
plots | Enable Diagnostic Plots
Optional If TRUE, generates diagnostic plots as part of sample plan QA/QC and zone group clustering. Plots are saved to the ‘report/plots/’ directory and include maps of study region block groups, PUMA-to-client zone intersections, and clustering/zone group results. Used in calc_sample_plan_counts and zone_group_plots. Deprecated for more granular control, but still useful for quick checks.
Called Directly by Functions:
- calc_sample_plan_counts
- entropy_zone_groups
Called Indirectly by Functions:
- cluster_pumas
- prepare_zone_groups
Used in Scripts:
- 06_target-data-preparation.qmd
- 08_round-1-weighting.qmd
Default:
plots: noC.10 Legacy / Deprecated (kept for backwards compatibility)
hts_rmove_version | HTS RMove Data Version Suffix (Deprecated)
Optional Suffix to append to HTS table names for versioning when constructing database table references. This setting is DEPRECATED and replaced by ‘hts_table_map’, which directly maps canonical names to database tables. Only used in legacy workflows with ‘hts_tables_prefix’; not required for current workflows.
Called Directly by Functions:
- get_db_table_name
Called Indirectly by Functions:
- fetch_hts_table
- fix_value_labels_on_load
- get_income_broad
- get_max_income_bin
- get_max_survey_income_bin
- get_user_specs
- impute_income_pnta
- prep_hhs_for_income_imputation
- prep_initial_expansion_data
- prepare_zone_groups
- test_results
- update_income_broad_labels
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd
hts_tables_prefix | HTS Table Name Prefix (Deprecated)
Optional Prefix to prepend to HTS table names when constructing database table references. This setting is DEPRECATED and replaced by ‘hts_table_map’, which provides direct mapping of canonical table names to specific database tables. Only used for legacy workflows; if set, will be combined with ‘hts_rmove_version’ (if present) to create table names. Not required for current workflows.
Called Directly by Functions:
- get_db_table_name
Called Indirectly by Functions:
- fetch_hts_table
- fix_value_labels_on_load
- get_income_broad
- get_max_income_bin
- get_max_survey_income_bin
- get_user_specs
- impute_income_pnta
- prep_hhs_for_income_imputation
- prep_initial_expansion_data
- prepare_zone_groups
- test_results
- update_income_broad_labels
Used in Scripts:
- 05_setup-data-geographies.qmd
- 06_target-data-preparation.qmd
- 07_survey-data-preparation.qmd
- 08_round-1-weighting.qmd
- 09_round-2-weighting.qmd
- 10_round-3-weighting.qmd