Skip to contents

Cleans and formats PUMS person-level data for use in non-relative income imputation models, generating predictors and flags for model fitting. Use to create model-ready data for quantile regression or similar approaches.

Usage

prepare_income_fit_dt(pums_cleaned, target_names, settings)

Arguments

pums_cleaned

data.table. Cleaned PUMS person-level data. Required columns:

  • PUMA — PUMA identifier

  • SERIALNO — household ID

  • SPORDER — person sequence number

  • PINCP — personal income

  • AGEP — age

  • RELSHIPP_label — relationship label Rows: one per person. Keys: (SERIALNO, SPORDER). Modified by reference: no (returns copy).

target_names

character vector. Names of model predictor variables (e.g., 'p_employment', 'p_age', 'p_univstudent').

settings

list. Settings object with configs. Keys:

  • age_employable , optional — minimum age for employable persons. Default 16.

Value

data.table. Model-ready PUMS data for imputation. Columns:

  • person_id — unique person identifier

  • hh_id — household identifier

  • model predictor columns for each variable in target_names

  • PINCP — personal income

  • AGEP — age

  • unrelated — flag for nonrelative status Rows: one per person. Keys: (person_id, hh_id).

Details

  • Reads PUMS codebook and sets employable age threshold from settings (default 16).

  • Copies input data to avoid reference modification.

  • Adds identifiers: puma_id, hh_id, person_id.

  • Calls prepare_impute_targets to generate model predictors for each person.

  • Sets negative incomes and ages below employable threshold to zero.

  • Flags nonrelatives using RELSHIPP_label pattern matching.

  • Subsets to working-age, unrelated persons for model fitting.

  • Returns a data.table with predictors and flags for imputation.

  • Assumes valid PUMS data and codebook; errors if required columns are missing.

Settings

  • age_employable (direct): minimum age for employable persons. Default 16.

Examples

## Not run:
prepare_income_fit_dt(pums_cleaned, target_names = c('p_employment', 'p_age'), settings)
#> Error in prepare_income_fit_dt(pums_cleaned, target_names = c("p_employment",     "p_age"), settings): could not find function "prepare_income_fit_dt"
## End(Not run)