Summarize PUMS data with target updates and confidence intervals
summarize_pums.RdSummarizes disaggregated PUMS data, applying target updates and scaling weights to client zones, then computes totals and confidence intervals by group. Use for reporting and validation of control totals against survey results.
Usage
summarize_pums(
pums_wide,
group_col = NULL,
puma_zone_group_xwalk,
ci_level = 0.9,
settings
)Arguments
- pums_wide
data.table. Disaggregated PUMS target data. Required columns:
puma_id: PUMA ID hh_id: household ID person_id: person ID WGTP: household weight PWGTP: person weight h_total: household total p_total: person total Rows: one per person or household. Modified by reference: no (returns copy).
- group_col
character vector. Columns to group by. Default: NULL.
- puma_zone_group_xwalk
data.table. Crosswalk between PUMS and zone groups via client zones. Must include all grouping columns except geometry.
- ci_level
numeric. Confidence interval level (fraction, not percent). Default: 0.9.
- settings
list. Project settings; must include target update definitions and weighting flags.
Value
data.table. Summarized target data by group, with columns:
Grouping columns (as specified)
Target variable columns
total,lower,upperfor each target (confidence interval)Row order: by group and target
Details
Checks for required columns in PUMS data:
puma_id,hh_id,person_id,WGTP,PWGTP,h_total,p_total.Checks for required columns in crosswalk:
group_col,zone_group,zone_group_label,puma_id,prop_hh,prop_per.Drops geometry from crosswalk to reduce memory usage.
Merges PUMS data with crosswalk, allowing cartesian join for multiple PUMA zones per client zone.
Applies
update_targets()to harmonize PUMS targets with client zone definitions.Scales weights to client zones:
If
force_balance_hh_weightsis TRUE, pegs household weights from person weights and adjusts by one proportion column.Otherwise, adjusts both person and household weights independently.
Ensures that weighted totals are unchanged after scaling (checksums).
Sets weights and totals to zero for outside-region rows (
client_zone_id == -1).Aggregates by group, client zone, and zone group label for households and persons.
Asserts that weights match after update if forced balancing is enabled.
Calls
summarize_data()to aggregate by group and calculate confidence intervals for households and persons.Checks that summarized totals match naive weighted sums (checksum validation).
Returns a data.table of summarized targets by group, with confidence intervals.
Error handling: stops if required columns are missing or weights do not match after update.
FIXME: Replicate weight calculation for standard errors is not implemented (see code comment and reference link).
Settings
force_balance_hh_weights (direct): controls weight scaling logic.
study_unit (direct): selects proportion column for scaling.
Uses target update definitions from
settingsfor harmonization.
See also
summarize_data, update_targets, summarize_survey
Other reporting utilities:
find_level_idx(),
get_target_map(),
summarize_data(),
summarize_survey(),
tabulate_target()