Skip to contents

Cleans and subsets survey person data for non-relative income imputation, using explicit regex patterns to identify unrelated relationships and employable ages. Use to generate model-ready person-level data for income imputation.

Usage

prepare_persons_dt(persons, target_names, value_labels, settings)

Arguments

persons

data.table. Survey person data. Required columns:

  • person_id — unique person identifier

  • hh_id — household identifier

  • relationship — coded relationship

  • age — age Rows: one per person. Keys: (person_id, hh_id). Modified by reference: no (returns copy).

target_names

character vector. Names of model predictor variables (e.g., 'p_employment', 'p_age', 'p_univstudent').

value_labels

data.table. Value labels for variables. Required columns:

  • variable — variable name

  • label — value label

  • value <integer/character> — coded value

settings

list. Settings object with configs. Keys:

  • age_employable , optional — minimum age for employable persons. Default 16.

Value

data.table. Non-relative, employable persons with model targets. Columns:

  • person_id — unique person identifier

  • hh_id — household identifier

  • model predictor columns for each variable in target_names Rows: one per person. Keys: (person_id, hh_id).

Details

  • Uses regex pattern to identify unrelated relationships in value_labels:

    • Pattern: "nonrelative|unrelated|household help|roommate/friend|^other$"

    • Applied to relationship variable labels (case-insensitive).

  • Uses regex pattern to identify employable ages in value_labels:

    • Pattern: "(16)\s*(?:-|to)?\s*(17)"

    • Applied to age variable labels.

  • Subsets persons to those with unrelated relationships and age >= employable code.

  • Calls prepare_impute_targets to generate model predictors for each person.

  • Returns a data.table of non-relative, employable persons with model targets.

  • Assumes value_labels contains all required variables and labels; errors if regex match fails or is ambiguous.

Regex dependencies

  • Relationship pattern: "nonrelative|unrelated|household help|roommate/friend|^other$" (case-insensitive)

  • Age pattern: "(16)\s*(?:-|to)?\s*(17)"

Settings

  • age_employable (direct): minimum age for employable persons. Default 16.

Examples

## Not run:
prepare_persons_dt(persons, target_names = c('p_employment', 'p_age'), value_labels, settings)
#> Error in prepare_persons_dt(persons, target_names = c("p_employment",     "p_age"), value_labels, settings): could not find function "prepare_persons_dt"
## End(Not run)