# Load hts.weighting Packages
devtools::load_all()
# Load Settings; Pass python_env Explicitly for Quarto
settings = get_settings(reload_settings = TRUE,
print = FALSE)
pums_year = get("pums_year", settings)
check_path = file.path(get("report_dir", settings), "020_check_counts.csv")
cli::cli_inform(check_path)6 Target Data Preparation
Two key Census data sources are used for weighting:
- 5-year ACS estimates: Provide household and population counts at a highly disaggregated, block group level, offering statistical reliability due to cumulative sampling over multiple years. These are essential for sample planning and constructing base weights at the same geographic scale as sampling.
- 1-year ACS Public Use Microdata Sample (PUMS): Provide detailed crosstabulations by household and person characteristics at the PUMA (Public Use Microdata Area) level, corresponding closely to the survey year and less affected by pandemic-era distortions. These are used to set demographic targets for weighting.
- Reference counts: Number of households/persons recorded in the sample plan using 5-year ACS data, at the block group level. Used for initial base weight calculations.
- Target estimates: Population and household totals from the 1-year ACS PUMS, at PUMA level, used to set demographic weighting targets.
This step translates 1-year ACS PUMS data into PopulationSim-ready control totals. The script imports 1-year PUMS microdata, removes group-quarters residents, reconciles household and person weights, and allocates both households and persons to the study’s weighting zones. These processed targets define the benchmark totals that PopulationSim matches during the weighting rounds.
Before weighting, the sample plan reference counts must be calibrated to represent the same population used in PopulationSim. The 5-year ACS estimates enable spatial disaggregation at the block group level, while 1-year PUMS data allow for detailed demographic targets at the PUMA level. The final step in this chapter is to align the sample plan’s reference counts (from 5-year ACS) to the target estimates (from 1-year PUMS) for households and persons.
Targets are fixed control totals from external data (ACS/PUMS). Weights are adjustments applied to survey data so that estimates align with the targets (i.e., population).
6.1 Chapter Setup
This script begins by loading the R packages and configuration settings needed to construct the household and person-level weighting targets. Of key importance in this chapter is the pums_year, which specifies the vintage of the ACS PUMS data to be used. This should match the year of the ACS 5-year estimates used for the crosswalk denominators.
6.1.1 Load Packages and Settings
6.1.2 Load 1-year PUMS Dataset
The next step imports the 1-year PUMS data, which provides the individual and household-level detail needed to specify target variables such as income, household size, and employment status. After loading, the script retains only the columns specified to create target variables - such as household income, age, and sex - to streamline data processing when preparing target variables.
# PUMS Data Dictionary for Reference
pums_vars = read_pums_codebook(settings)
pums_0 = fetch_pums(settings)
pums_sf = get_puma_geom(settings)
# Record Initial Weights for QA
check_dt = record_checksum(
fname = check_path,
append = FALSE,
pums_checksum("initial", pums_0, "person")
)6.2 Target Data Preparation: Clean 1-Year ACS PUMS Data
The next step adjusts the ACS PUMS 1-year data used to construct detailed weighting targets. To do so, the following steps are performed:
- Remove group quarters residents from the data.
- Person-level studies only: Adjust households to persons
- Align the PUMS data to itself by adjusting the household-level weighted estimates to match the total weighted sum of persons in the household using person-level weighted estimates.
- PUMS data are separated into household- and person-level datasets for creating and tabulating weighting target variables.
6.2.1 Remove Group Quarters from PUMS Data
Once the data are loaded, the script filters the PUMS records to match the survey sampling frame. All targets must reflect the same population frame as the survey sample frame. Residents of group quarters, such as dormitories, prisons, and nursing homes, are removed from the data as these are not part of the core survey address-based sampling frame. This cleaning step removes those group-quarters cases and retains only household-based records.
type_var = str_subset(names(pums_0), "TYPE(_label|HUGQ_label)$")
# Check for Skew Between HH/PER Weights
check_weight_skew(pums_0, "initial unadjusted PUMS")
# Separate Group Quarters and Housing Units
pums_gq = pums_0[get(type_var) != "Housing unit"]
pums_hu = pums_0[get(type_var) == "Housing unit"]
# Confirm Split is Correct
stopifnot(sum(pums_gq$PWGTP) + sum(pums_hu$PWGTP) == sum(pums_0$PWGTP))
stopifnot(pums_hu[get(type_var) != "Housing unit", .N] == 0)
# Report Skew Between HH/PER Weights After Split
check_weight_skew(pums_hu, "after dropping GQ")
check_dt = record_checksum(
fname = check_path,
append = TRUE,
pums_checksum("remove GQ", pums_hu, "person")
)6.2.2 For Person-Level Studies – Adjust Households to Persons
Person-level studies require a transformation of the PUMS household data to ensure each adult is treated as an individual household unit (just as they were in a person-level HTS). This adjustment involves several steps:
- Count the number of adults and children in each household to retain household composition information.
- Re-label the number of vehicles to maintain consistency.
- Assign each adult as their own household by modifying the SERIALNO and SPORDER identifiers, effectively creating a new household for each adult while dropping children from the dataset. This ensures that person-level targets are accurately represented, with each adult treated as a separate entity for weighting purposes.
- Verify that the total weights remain consistent after this restructuring, ensuring that the overall population estimates are preserved.
if (get("study_unit", settings) == "person") {
# Count kids and adults per household
num_kids = pums_hu[AGEP < 18, .N, keyby = SERIALNO]
pums_hu[num_kids, num_kids := i.N, on = "SERIALNO"]
pums_hu[is.na(num_kids), num_kids := 0]
# Find the number of adults in each household for precomputation
num_adults = pums_hu[AGEP >= 18, .N, keyby = SERIALNO]
pums_hu[num_adults, num_adults := i.N, on = "SERIALNO"]
pums_hu[is.na(num_adults), num_adults := 0] # Should never happen but here as a safeguard
# Set hh target to be carried over at person level
pums_hu[, h_kids := num_kids]
pums_hu[, h_adults := num_adults]
# Re-label the number of vehicles in each household for precomputation
pums_hu[, num_vehicles := VEH]
# Assign each adult as their own household
pums_hu[, `:=`(SPORDER_orig = SPORDER, SERIALNO_orig = SERIALNO)]
pums_hu[, SERIALNO := paste0(SERIALNO, stringr::str_pad(SPORDER, 2, pad = "0"))]
pums_hu[, NP_adj := 1]
# Drop children
pums_hu = pums_hu[AGEP >= 18]
# Record checksum
check_dt = record_checksum(
fname = check_path,
append = TRUE,
pums_checksum("drop children", pums_hu, "person")
)
# Update SPORDER to start at 1. This ensures we can still use the SPORDER to filter on HHs
pums_hu = pums_hu[order(SERIALNO, SPORDER_orig)]
pums_hu[, SPORDER := rowid(SERIALNO)]
# Find households with no SPORDER == 1
ok_hh = pums_hu[SPORDER == 1, SERIALNO]
stopifnot(all(pums_hu$SERIALNO %in% ok_hh))
# Should be similar, but not exactly the same
# Check that weights are consistent after transformation
stopifnot(
all.equal(
sum(pums_hu$PWGTP),
sum(pums_hu$NP_adj * pums_hu$WGTP),
tolerance = 0.05
)
)
# Record checksum
check_dt = record_checksum(
fname = check_path,
append = TRUE,
pums_checksum("create person hh", pums_hu, "person")
)
}6.2.3 Reconcile Household and Person Weights
A key data management step in the target preparation process is reconciling household (WGTP) and person (PWGTP) weights. These two weighting schemes in PUMS are produced independently by the Census Bureau and therefore are not internally consistent. To ensure coherent expansion factors, the script recalculates each household’s weight so that the sum of its members’ person weights equals its adjusted household weight. This alignment guarantees that total household counts and total person counts are derived from the same underlying expansion basis, preventing logical conflicts when PopulationSim later calibrates both household and person targets simultaneously. The process also includes simple diagnostics to confirm that household and person totals remain in proportion after reconciliation.
# Balance Household and Person Weights if Requested
if (get("force_balance_hh_weights", settings)) {
pums = force_balance_pums_weights(pums_hu)
# Record checksum
check_dt = record_checksum(
fname = check_path,
append = TRUE,
pums_checksum("force balance hh weights", pums, "person")
)
} else {
pums = pums_hu
}
stopifnot(pums[is.na(WGTP), .N] == 0)
# Sort for Consistency and Save Cleaned PUMS Data
pums = pums[order(SERIALNO, SPORDER)]6.2.4 Separate Households and Persons
In this step, RSG separates the cleaned PUMS data into distinct household and person datasets. It identifies which variables pertain to households (e.g., HINCP, NP) and which pertain to persons (e.g., AGEP, SEX). The script then splits the data accordingly, ensuring that each dataset contains only the relevant columns. This separation is crucial for subsequent target tabulation, as household-level targets (like household size and income) and person-level targets (like age and employment status) need to be processed independently. The script also retains common variables, such as PUMA and SERIALNO, in both datasets to facilitate later merging and analysis.
In other words, the step prepares a “household” and “person” dataset, mirroring the data structure of the HTS.
# Identify Household/Person Variables From Codebook/Settings
pums_hvars = get("pums_hvars", settings)
pums_hvars = c(pums_hvars, paste0(pums_hvars, "_label"))
pums_hvars = intersect(pums_hvars, names(pums))
pums_pvars = get("pums_pvars", settings)
pums_pvars = c(pums_pvars, paste0(pums_pvars, "_label"))
pums_pvars = intersect(pums_pvars, names(pums))
# Split Into Household and Person Datasets
pums_cvars = setdiff(names(pums), c(pums_hvars, pums_pvars))
# Split Into Households and Persons
pums_hh = pums[SPORDER == 1, c(pums_cvars, pums_hvars), with = FALSE]
pums_per = pums[, c(pums_cvars, pums_pvars), with = FALSE]6.3 Configure Target Variables
Ahead of weighting, a set of target estimates representing population distributions across household- and person-level characteristics are created. These target estimates are constructed from ACS data and are chosen to match key variables in the survey (e.g., household size, income, age). In this step, a set of demographic target variables are created using the information specified in the project settings. Different household and personal attributes affect survey response, which presents bias in unweighted survey data. For example, larger households may be less likely to respond due to the additional time needed to complete the survey questions and travel diaries for each member. To correct these types of biases, a variety of household- and person-level target categories are selected as weighting targets.
- Target variable: A household- or person-level characteristic used for weighting (e.g., household size, income).
- Target category: A level or bin within a target variable (e.g., 1-person household, $30k-$50k income).
- Target estimate (control total): The population count matching each target category, derived from ACS or another external source.
6.3.1 Create and Tabulate Target Variables
This step creates the target variables and categories as specified in the project settings. It then calculates the weighted target estimates for each target category. A small subsequent step sets household income to zero where it is negative. Negative income values in PUMS represent losses or debts, which can complicate target matching. By normalizing these values to zero, the script ensures that income-based targets reflect those values HTS respondents can report (negative income is not a survey response option).
This step relies on a key function, prepare_targets, which processes the separated household and person datasets to generate the final target tables. The function uses the variable definitions from the codebook to categorize and tabulate the data according to the specified target variables. It also incorporates any necessary adjustments based on the study settings, such as scaling factors or demographic groupings. The output is a set of tabulated targets that reflect the weighted counts of households and persons across various categories, ready for use in PopulationSim. The script also includes checks to ensure that the tabulated targets align with expected totals and distributions, providing confidence in the accuracy of the prepared data. For additional details, see the documentation for prepare_targets() and associated functions (e.g., prep_target_age, prep_target_income).
pums_hh[HINCP < 0, HINCP := 0]
pums_hh[, `:=`(
hh_id = SERIALNO,
puma_id = PUMA,
h_puma = PUMA
)]
pums_per[, `:=`(
hh_id = SERIALNO,
person_id = paste0(SERIALNO, SPORDER),
puma_id = PUMA,
p_puma = PUMA
)]
# Add Dummy Columns for Total Counts
pums_hh[, h_total := 1]
pums_per[, p_total := 1]
# Tabulate Targets for Weighting
pums_tabbed = prepare_targets(
households = pums_hh,
persons = pums_per,
codebook = pums_vars,
settings = settings
)
check_group_sums(pums_tabbed, settings)6.3.2 Aggregate Targets to PUMA Level
This step aggregates the tabulated household and person target estimates to the PUMA level. It sums the weighted counts for each target variable, ensuring that the totals reflect the population estimates for each PUMA. The aggregation is done separately for household and person targets, using the appropriate weights (WGTP for households and PWGTP for persons). After aggregation, the script merges the household and person target tables into a single dataset, keyed by PUMA ID. This consolidated target table serves as the basis for further adjustments to align with the study’s geographic zones. The script also includes checks to verify that the aggregated totals are consistent with expectations, providing an additional layer of quality assurance.
# Identify the Person-Level and Household-Level Target Variables
p_cols = grep("^p_", names(pums_tabbed), value = TRUE)
h_cols = grep("^h_", names(pums_tabbed), value = TRUE)
# Aggregate the Person Targets Separately on puma_id
per_targs = pums_tabbed[, lapply(.SD, function(x) sum(x * PWGTP)), by = puma_id, .SDcols = p_cols]
# Keep Only 1 Record per Hh for Hh Columns
hh_targs = pums_tabbed[, .SD[1], by = .(hh_id, puma_id, WGTP), .SDcols = h_cols]
hh_targs = hh_targs[, lapply(.SD, function(x) sum(x * WGTP)), by = puma_id, .SDcols = h_cols]
# Merge Household and Person Targets
target_puma = merge(per_targs, hh_targs, by = "puma_id", all = TRUE)
# Record Checksum
check_dt = record_checksum(
fname = check_path,
append = TRUE,
data.table(
dataset = "PUMS",
step = "calculate targets",
sum_hh_wt = target_puma[, sum(h_total)],
sum_per_wt = target_puma[, sum(p_total)],
sum_hhwtXnp = NA,
n_rows = target_puma[, .N],
n_hh = NA,
n_per = NA,
unit = 'PUMA'
)
)6.3.3 Optional: Adjust PUMS Targets to Weighting Zones
This step translates the PUMA-based control totals to the study’s weighting zones if PUMAs are not used as the weighting zones. This is done using a spatial crosswalk to the custom weighting zones (often called “client zones”). The script reallocates household and person targets in proportion to each PUMA’s overlap with the study zones. This ensures that control totals reflect the population distribution within the modeled region rather than the full PUMA extent.
Where no custom weighting zones are defined, the function performs a 1:1 pass-through, trimming any PUMAs that extend beyond the study boundary. A check is performed at the end of the step to confirm that aggregated totals remain consistent, providing confidence that the allocation preserves the original universe.
PSRC did not use custom weighting zones, so this step effectively behaves as a pass-through with sanity checks.
puma_zone_xwalk_sf = readRDS(file.path(
get("working_dir", settings),
"puma_czone_xwalk.rds"
))
# Adjust Targets to Client Zones Using the Zrosswalk
target_czones = adjust_target_to_study_zones(
puma_targets = target_puma,
puma_crosswalk = puma_zone_xwalk_sf,
settings
)
# Record Checksum
check_dt = record_checksum(
fname = check_path,
append = TRUE,
data.table(
dataset = "PUMS",
step = "adjust to client zones adj target",
sum_hh_wt = target_czones[, sum(h_total)],
sum_per_wt = target_czones[, sum(p_total)],
sum_hhwtXnp = NA,
n_rows = target_czones[, .N],
n_hh = NA,
n_per = NA,
unit = 'client zone'
)
)6.3.4 Create Regional Targets
After creating and tabulating target estimates for each weighting zone, targets are then tabulated at the region level. These targets help PopulationSim match regional population totals and other targets constructed at the regional level (e.g., transit targets). They can also serve as a fallback if the model fails to converge at the finer zone level.
target_region = target_czones[, !"client_zone_id", with = FALSE][, lapply(.SD, sum)]
target_region[, region := 1]6.3.5 Optional: Create Transit Targets
Some clients choose to incorporate a transit target to help ensure that the model accurately reflects transit usage patterns. If specified in the settings, this step adds transit boardings as a target at the regional level. Since transit boardings are not available in the PUMS data, the script pulls this information from an external source, ideally broken down by transit agency to serve as a proxy for sub-regional distribution. However, this requires boardings data, which may not always be available, especially at the granular level required to construct “typical weekday” estimates. The script adds the total average weekday ridership or boardings to the regional targets, providing an additional control for PopulationSim to match during the weighting process.
PSRC did not use a transit target.
if ("h_transit_trips" %in% names(settings$targets)) {
h_transit_trips = prep_transit_target(settings)
target_region[, names(h_transit_trips) := h_transit_trips]
}
# Check that Totals Add up Properly --------------------------------------------
check_group_sums(target_czones, settings)6.3.6 Update the Targets with Day-of-Week Estimates
RSG’s weighting process enables clients to weight their data at the level of individual days of the week or groups of days of the week. For example, a client might choose to weight their data to match separate targets for weekdays (Monday-Friday) and weekends (Saturday-Sunday). In this scenario, the weighting process would ensure that the weighted survey data aligns with the specified targets for both weekdays and weekends. This requires minor restructuring of the target data for PopulationSim. In cases where day-of-week weights are not specified, this step will create a replicate of the total household and population variables labeled with the name specified for the grouped days in the settings.
PSRC did not specify day-of-week targets.
day_groups_dt = get_day_groups(settings)
day_groups = unique(day_groups_dt$day_group)
dayregex = paste(day_groups, collapse = "|")
pattern = stringr::str_glue("^(p_|h_)((?!({dayregex})).)*$")
targets_ls = list(
"puma" = copy(target_puma),
"czone" = copy(target_czones),
"region" = copy(target_region)
)
for (target_name in names(targets_ls)) {
# Cols to update, non DOW columns
targ_cols = stringr::str_subset(names(targets_ls[[target_name]]), pattern)
# Assign DOW columns from h_total and p_total
targets_ls[[target_name]][, paste0("h_dow_", day_groups) := h_total]
targets_ls[[target_name]][, paste0("p_dow_", day_groups) := p_total]
# Rescale everything by n days
targets_ls[[target_name]][,
(targ_cols) := lapply(.SD, function(x) x * length(day_groups)),
.SDcols = targ_cols
]
}6.3.7 Optional: Aggregate Target Categories for Weighting
Sometimes certain targets and target categories can have large error margins around the Census target estimates, which can impact the results of weighting. One option is to aggregate target categories for certain weighting zone groups or the entire region. This step will optionally aggregate target categories as specified in the project settings file.
Due to large error margins around the Census target estimates for person-level commute mode for the categories bike, walk, and transit, these were aggregated for all regions except King County - Seattle.
# Create a List of Zone Group and Regional Targets
targets = list("zone_group" = target_czones, "region" = target_region)
# Convert the PUMA to Client Zone Crosswalk to a `data.table`
puma_zone_xwalk <- as.data.table(puma_zone_xwalk_sf)
# Create a Weighting Zone Group Crosswalk to Use in Updating Targets and Convert to a `data.table`
zone_group_crosswalk = prepare_zone_groups(
seed = puma_zone_xwalk,
targets = target_czones,
settings = settings,
show_plot = FALSE,
replace = FALSE
)
zone_group_crosswalk = as.data.table(zone_group_crosswalk)
# Append the Zone Group to the Matching Weighting Zone ID in the Target Estimate Tables
target_czones[zone_group_crosswalk, zone_group := i.zone_group, on = "client_zone_id"]
target_region[, zone_group := "region"]
# Update the Targets as Specified in the Project Settings Configurations
updated_targets_czone <- update_targets(target_czones, settings, geo_cross_walk = as.data.table(zone_group_crosswalk), geom = "zone_group")
updated_targets_region <- update_targets(target_region, settings, geo_cross_walk = as.data.table(zone_group_crosswalk), geom = "zone_group")6.3.8 Review Household- and Person-Level Target Estimates
6.3.8.1 Table: Total HH and Population Targets for each weighting zone group
Weighting Zone Group | Household | Persons |
|---|---|---|
King County - Seattle | 368,012.2 | 729,129 |
King County - Other | 541,295.8 | 1,373,543 |
Kitsap County - Expanded | 201,550.5 | 502,271 |
Pierce County | 313,545.5 | 803,114 |
Snohomish County | 320,948.9 | 835,625 |
Total | 1,745,353.0 | 4,243,682 |
6.3.8.2 Table: Household-Level Target Variables by Weighting Zone Groups: ACS 1-year Target Estimates
Target Variable | Target Category | King County - Seattle | King County - Other | Kitsap County - Expanded | Pierce County | Snohomish County | Region |
|---|---|---|---|---|---|---|---|
Household size | 1 | 156,491 | 133,620 | 48,670 | 83,386 | 75,695 | 497,862 |
2 | 127,110 | 187,137 | 78,355 | 105,182 | 108,169 | 605,952 | |
3+ | 84,412 | 220,539 | 74,526 | 124,978 | 137,085 | 641,539 | |
Household income | $0-$24,999 | 43,569 | 49,399 | 21,427 | 32,804 | 31,423 | 178,621 |
$25,000-$49,999 | 36,634 | 54,603 | 22,300 | 41,303 | 37,079 | 191,919 | |
$50,000-$74,999 | 39,487 | 59,248 | 35,266 | 44,094 | 44,539 | 222,633 | |
$75,000-$99,999 | 35,324 | 58,226 | 25,197 | 44,753 | 38,900 | 202,400 | |
$100,000-$199,999 | 104,632 | 159,080 | 69,075 | 109,689 | 105,446 | 547,922 | |
$200,000+ | 108,367 | 160,740 | 28,286 | 40,903 | 63,561 | 401,857 | |
Number of workers | 0 | 64,488 | 102,754 | 52,106 | 68,537 | 63,698 | 351,583 |
1 | 162,434 | 197,222 | 71,467 | 115,362 | 118,051 | 664,536 | |
2+ | 141,090 | 241,320 | 77,977 | 129,646 | 139,200 | 729,233 | |
Vehicle sufficiency | None | 76,991 | 37,006 | 8,523 | 16,193 | 14,904 | 153,618 |
Insufficient | 85,179 | 123,106 | 36,222 | 51,313 | 65,267 | 361,086 | |
Sufficient | 205,842 | 381,184 | 156,806 | 246,040 | 240,777 | 1,230,649 | |
Number of children | 0 | 303,802 | 371,352 | 143,820 | 214,796 | 216,633 | 1,250,404 |
1+ | 64,210 | 169,944 | 57,731 | 98,749 | 104,316 | 494,949 | |
Total: Households | 368,012 | 541,296 | 201,551 | 313,546 | 320,949 | 1,745,353 |
6.3.8.3 Table: Total Households in PUMAs Weighting Zone Groups: ACS 1-year Target Estimates
Target Variable | Target Category | King County - Seattle | King County - Other | Kitsap County - Expanded | Pierce County | Snohomish County | Region |
|---|---|---|---|---|---|---|---|
Total Households in PUMA | 5323301 | 0 | 54,363 | 0 | 0 | 0 | 54,363 |
5323302 | 0 | 43,261 | 0 | 0 | 0 | 43,261 | |
5323303 | 0 | 74,179 | 0 | 0 | 0 | 74,179 | |
5323304 | 0 | 64,104 | 0 | 0 | 0 | 64,104 | |
5323305 | 0 | 57,591 | 0 | 0 | 0 | 57,591 | |
5323306 | 0 | 45,916 | 0 | 0 | 0 | 45,916 | |
5323307 | 0 | 43,711 | 0 | 0 | 0 | 43,711 | |
5323308 | 0 | 0 | 51,302 | 0 | 0 | 51,302 | |
5323309 | 0 | 51,522 | 0 | 0 | 0 | 51,522 | |
5323310 | 0 | 54,637 | 0 | 0 | 0 | 54,637 | |
5323311 | 0 | 52,012 | 0 | 0 | 0 | 52,012 | |
5323312 | 47,709 | 0 | 0 | 0 | 0 | 47,709 | |
5323313 | 42,353 | 0 | 0 | 0 | 0 | 42,353 | |
5323314 | 63,202 | 0 | 0 | 0 | 0 | 63,202 | |
5323315 | 64,788 | 0 | 0 | 0 | 0 | 64,788 | |
5323316 | 49,403 | 0 | 0 | 0 | 0 | 49,403 | |
5323317 | 48,003 | 0 | 0 | 0 | 0 | 48,003 | |
5323318 | 52,555 | 0 | 0 | 0 | 0 | 52,555 | |
5323501 | 0 | 0 | 55,078 | 0 | 0 | 55,078 | |
5323502 | 0 | 0 | 56,560 | 0 | 0 | 56,560 | |
5325301 | 0 | 0 | 0 | 54,734 | 0 | 54,734 | |
5325302 | 0 | 0 | 0 | 40,243 | 0 | 40,243 | |
5325303 | 0 | 0 | 0 | 38,343 | 0 | 38,343 | |
5325304 | 0 | 0 | 0 | 45,812 | 0 | 45,812 | |
5325305 | 0 | 0 | 0 | 50,283 | 0 | 50,283 | |
5325306 | 0 | 0 | 0 | 38,135 | 0 | 38,135 | |
5325307 | 0 | 0 | 0 | 45,996 | 0 | 45,996 | |
5325308 | 0 | 0 | 38,612 | 0 | 0 | 38,612 | |
5326101 | 0 | 0 | 0 | 0 | 57,883 | 57,883 | |
5326102 | 0 | 0 | 0 | 0 | 54,040 | 54,040 | |
5326103 | 0 | 0 | 0 | 0 | 51,159 | 51,159 | |
5326104 | 0 | 0 | 0 | 0 | 51,840 | 51,840 | |
5326105 | 0 | 0 | 0 | 0 | 42,767 | 42,767 | |
5326106 | 0 | 0 | 0 | 0 | 63,260 | 63,260 |
6.3.8.4 Table: Person-Level Target Variables by Weighting Zone Groups: ACS 1-year Target Estimates
Target Variable | Target Category | King County - Seattle | King County - Other | Kitsap County - Expanded | Pierce County | Snohomish County | Region |
|---|---|---|---|---|---|---|---|
Gender | Male | 379,274 | 690,208 | 251,969 | 398,496 | 418,976 | 2,138,923 |
Female | 349,855 | 683,335 | 250,302 | 404,618 | 416,649 | 2,104,759 | |
Age | 0-4 | 31,595 | 75,075 | 28,200 | 45,198 | 49,710 | 229,778 |
5-15 | 65,113 | 191,843 | 63,704 | 117,889 | 111,272 | 549,821 | |
16-17 | 11,927 | 34,836 | 10,204 | 23,407 | 22,911 | 103,285 | |
18-24 | 63,082 | 90,062 | 40,801 | 61,301 | 58,179 | 313,425 | |
25-44 | 311,646 | 431,913 | 139,414 | 246,289 | 258,077 | 1,387,339 | |
45-64 | 151,301 | 348,426 | 121,095 | 192,144 | 209,417 | 1,022,383 | |
65+ | 94,465 | 201,388 | 98,853 | 116,886 | 126,059 | 637,651 | |
Employment | Non worker | 262,999 | 629,775 | 253,772 | 390,930 | 395,599 | 1,933,075 |
Part-time | 82,074 | 138,237 | 56,702 | 77,056 | 86,823 | 440,892 | |
Full-time | 384,056 | 605,531 | 191,797 | 335,128 | 353,203 | 1,869,715 | |
Commute mode | Bike/transit/walk | NA | 47,722 | 10,747 | 13,541 | 16,337 | NA |
Work from home | 134,902 | 163,590 | 34,840 | 51,388 | 75,630 | 460,350 | |
Transit | 62,077 | NA | NA | NA | NA | 115,386 | |
Walk | 37,122 | NA | NA | NA | NA | 67,584 | |
Bike | 12,791 | NA | NA | NA | NA | 17,367 | |
Other (includes auto) | 211,314 | 515,703 | 199,157 | 337,264 | 337,981 | 1,601,419 | |
None | 270,923 | 646,528 | 257,527 | 400,921 | 405,677 | 1,981,576 | |
University student status | No | 674,154 | 1,300,589 | 477,629 | 766,055 | 798,835 | 4,017,262 |
Yes | 54,975 | 72,954 | 24,642 | 37,059 | 36,790 | 226,420 | |
Educational attainment | No college | 192,478 | 552,620 | 221,299 | 416,401 | 386,728 | 1,769,526 |
Some college | 536,651 | 820,923 | 280,972 | 386,713 | 448,897 | 2,474,156 | |
Race | Asian or Pacific Islander | 126,375 | 340,737 | 43,464 | 67,555 | 128,513 | 706,644 |
Black or African American | 46,566 | 78,842 | 28,661 | 61,461 | 32,174 | 247,704 | |
Other | 114,087 | 243,238 | 95,642 | 182,011 | 148,631 | 783,609 | |
White | 442,101 | 710,726 | 334,504 | 492,087 | 526,307 | 2,505,725 | |
Ethnicity | Not Hispanic | 666,368 | 1,214,119 | 441,688 | 694,024 | 732,639 | 3,748,838 |
Hispanic | 62,761 | 159,424 | 60,583 | 109,090 | 102,986 | 494,844 | |
Total: Persons | 729,129 | 1,373,543 | 502,271 | 803,114 | 835,625 | 4,243,682 |
6.4 Save Cleaned and Tabulated PUMS Datasets
Note: The targets saved here are not the updated targets. That is because in preparing control tables for PopulationSim in Round 1 weighting, updates are made in those scripts. Updated outputs here are for review to ensure the updates applied are as specified in the project settings under target_updates prior to starting the weighting process.
# Write Out File ---------------------------------------------------------
saveRDS(pums, file = file.path(get("working_dir", settings), "pums_cleaned.rds"))
saveRDS(pums_tabbed, file = file.path(get("working_dir", settings), "pums_tabbed.rds"))
saveRDS(targets_ls[['puma']], file = get("target_puma_path", settings))
saveRDS(targets_ls[['czone']], file = get("target_czone_path", settings))
saveRDS(targets_ls[['region']], file = get("target_region_path", settings))6.5 Allocate PUMS Estimates to Sampled Block Groups
The last step of target data preparation is the allocation of PUMS households and persons to sample segments. This enables us to calculate the base weight at the same geographic scale as sampling occurred.
The allocation process involves a spatial crosswalk between PUMA and block group geographies. The fraction of each PUMA that overlaps with each block group is calculated, and this fraction is used to proportionally allocate PUMS households and persons to each block group. Totals are then aggregated to sample segments, which match the geographic scale of sampling.
Note: Small discrepancies (e.g., rounding errors) may occur during allocation, but total household and person estimates should remain consistent.
6.5.1 Load Data and Record Initial Target and Reference Sums
This step loads the tabulated target dataset (from 1-year PUMS/ACS) and the sample plan (from 5-year ACS), then records initial counts for households, persons, and observations.
# Load Client Zone and Regional Target Datasets (from ACS/PUMS).
target_czone = readRDS(get("target_czone_path", settings))
target_region = readRDS(get("target_region_path", settings))
# Number of Days Being Weighted (for Day-of-Week Splits).
n_weight_days = length(settings$weight_dow_groups)
check_path = file.path(get("report_dir", settings), "040_check_counts.csv")
# Record Initial Totals for Client Zone Targets.
check_dt = record_checksum(
fname = check_path,
append = FALSE,
data.table(
dataset = "PUMS",
step = "initial client zone targets",
n_hh = target_czone[, sum(h_total) / n_weight_days],
n_per = target_czone[, sum(p_total) / n_weight_days],
n_rows = target_czone[, .N],
unit = 'client zone'
)
)
# Load Value Labels and Sample Plan Used for Survey Assignment.
value_labels = fetch_hts_table("value_labels", settings)
sample_plan = fetch_hts_table("sample_plan", settings)
# Record Reference Counts in the Sample Plan.
check_dt = record_checksum(
fname = check_path,
append = TRUE,
data.table(
dataset = "sample_plan",
step = "initial sample plan reference counts",
n_hh = sample_plan[, sum(ref_count_hh)],
n_per = sample_plan[, sum(ref_count_per)],
n_rows = sample_plan[, .N],
unit = 'bg_geoid'
)
)6.5.2 Assign Weighting Zones to Sample Plan and Record Reference Sums
Weighting zone IDs are assigned to each block group in the sample plan. For block groups that partially overlap multiple weighting zones, the reference counts for households and persons are proportionally allocated using calculated area fractions. Zones outside the study area are excluded.
# Assign Weighting Zones (Client Zones) to Sample Plan Block Groups.
# If not Using Client Zone, the PUMA ID is the Client Zone
if (!"client_zone_id" %in% names(sample_plan)) {
bg_puma_czone_xwalk = readRDS(file.path(get("working_dir", settings), 'bg_puma_czone_xwalk.rds'))
sample_plan = merge(
sample_plan,
bg_puma_czone_xwalk[, .(bg_geoid = GEOID, client_zone, client_zone_id, area_prop)],
by = "bg_geoid",
allow.cartesian = TRUE
)
# Adjust reference counts by area proportion for partial overlaps
sample_plan[,
`:=`(
ref_count_hh = ref_count_hh * area_prop,
ref_count_per = ref_count_per * area_prop,
area_prop = NULL
)
]
# Remove zones outside the study area
sample_plan = sample_plan[client_zone_id != -1]
}
# Record Checksum After Zone Assignment
check_dt = record_checksum(
fname = check_path,
append = TRUE,
data.table(
dataset = "sample_plan_outer",
step = "assign weighting zones to sample plan",
n_hh = sample_plan[, sum(ref_count_hh)],
n_per = sample_plan[, sum(ref_count_per)],
n_rows = sample_plan[, .N],
unit = 'bg_geoid'
)
)6.5.3 Adjust Sample Plan Reference Counts to Match Target Data
Because block group and weighting zone boundaries do not align perfectly, reference counts are proportionally allocated so that their sums match the target population estimates for each weighting zone. To align the reference counts to the target estimates, an allocation factor is calculated. In the previous step, we adjusted the block group reference counts by the proportion of the area that a block group overlaps with a PUMA. To ensure the sums of the reference counts in the sample plan match the sums in the PUMS data, we calculate a factor to adjust block group reference counts. This factor is calculated by summing the block group reference counts to the PUMA level, then divide the PUMA target estimates by the block group reference counts to get an allocation factor. This factor is then applied to the reference counts at the block group level.
# Calibrate the Sample Plan so Totals Align with Targets.
sample_plan_adj = adjust_reference_to_target(
ref_counts = sample_plan,
targets = target_czone,
settings
)
# Record Checksum for the Adjusted Sample Plan
check_dt = record_checksum(
fname = check_path,
append = TRUE,
data.table(
dataset = "sample_plan_adj",
step = "adjust reference counts to PUMS data",
n_hh = sample_plan_adj[, sum(ref_count_hh)],
n_per = sample_plan_adj[, sum(ref_count_per)],
n_rows = sample_plan_adj[, .N],
unit = 'bg_geoid'
)
)6.6 Save Adjusted Sample Plan for Base Weight Calculation
# Write Out File ---------------------------------------------------------
saveRDS(sample_plan_adj, file = file.path(get("working_dir", settings), "sample_plan_adj.rds"))
saveRDS(sample_plan, file = file.path(get("working_dir", settings), "sample_plan_unadj.rds"))6.7 Table: Total PUMS-estimated Hhs and Persons at each Step of Pre-Processing
PSRC did not use custom weighting zones, so the allocation of PUMS to specified weighting zones should not show any change in estimates.
Step | Household estimate | Person estimate |
|---|---|---|
Initial PUMS file | 1,737,345 | 4,322,435 |
Remove Group Quarters Residents | 1,737,345 | 4,243,682 |
Align PUMS to itself | 1,745,353 | 4,243,682 |
Allocate PUMS to sample plan without adjustment | 1,673,496 | 4,163,187 |
Allocate PUMS to sample plan with adjustment | 1,745,353 | 4,243,682 |