Skip to contents

Summarizes disaggregated PUMS data, applying target updates and scaling weights to client zones, then computes totals and confidence intervals by group. Use for reporting and validation of control totals against survey results.

Usage

summarize_pums(
  pums_wide,
  group_col = NULL,
  puma_zone_group_xwalk,
  ci_level = 0.9,
  settings
)

Arguments

pums_wide

data.table. Disaggregated PUMS target data. Required columns:

  • puma_id : PUMA ID

  • hh_id : household ID

  • person_id : person ID

  • WGTP : household weight

  • PWGTP : person weight

  • h_total : household total

  • p_total : person total Rows: one per person or household. Modified by reference: no (returns copy).

group_col

character vector. Columns to group by. Default: NULL.

puma_zone_group_xwalk

data.table. Crosswalk between PUMS and zone groups via client zones. Must include all grouping columns except geometry.

ci_level

numeric. Confidence interval level (fraction, not percent). Default: 0.9.

settings

list. Project settings; must include target update definitions and weighting flags.

Value

data.table. Summarized target data by group, with columns:

  • Grouping columns (as specified)

  • Target variable columns

  • total, lower, upper for each target (confidence interval)

  • Row order: by group and target

Details

  • Checks for required columns in PUMS data: puma_id, hh_id, person_id, WGTP, PWGTP, h_total, p_total.

  • Checks for required columns in crosswalk: group_col, zone_group, zone_group_label, puma_id, prop_hh, prop_per.

  • Drops geometry from crosswalk to reduce memory usage.

  • Merges PUMS data with crosswalk, allowing cartesian join for multiple PUMA zones per client zone.

  • Applies update_targets() to harmonize PUMS targets with client zone definitions.

  • Scales weights to client zones:

    • If force_balance_hh_weights is TRUE, pegs household weights from person weights and adjusts by one proportion column.

    • Otherwise, adjusts both person and household weights independently.

  • Ensures that weighted totals are unchanged after scaling (checksums).

  • Sets weights and totals to zero for outside-region rows (client_zone_id == -1).

  • Aggregates by group, client zone, and zone group label for households and persons.

  • Asserts that weights match after update if forced balancing is enabled.

  • Calls summarize_data() to aggregate by group and calculate confidence intervals for households and persons.

  • Checks that summarized totals match naive weighted sums (checksum validation).

  • Returns a data.table of summarized targets by group, with confidence intervals.

  • Error handling: stops if required columns are missing or weights do not match after update.

  • FIXME: Replicate weight calculation for standard errors is not implemented (see code comment and reference link).

Settings

  • force_balance_hh_weights (direct): controls weight scaling logic.

  • study_unit (direct): selects proportion column for scaling.

  • Uses target update definitions from settings for harmonization.

See also

summarize_data, update_targets, summarize_survey

Other reporting utilities: find_level_idx(), get_target_map(), summarize_data(), summarize_survey(), tabulate_target()

Examples

## Not run:
summarize_pums(pums_wide, group_col = "zone_group", puma_zone_group_xwalk, ci_level = 0.9, settings = settings)
#> Error: object 'pums_wide' not found
## End(Not run)