Impute race for survey persons

Imputes race for survey persons using ACS reference data and survey race columns, supporting custom and collapsed race categories. Uses explicit regex patterns for column selection and category matching. Use for cleaning and expanding race data in survey imputation workflows.

Usage

impute_race(persons, variable_list, settings, seed = NULL, acs_year = NULL)

Arguments

persons

data.table. Survey person records. Required columns:

person_id — person identifier
hh_id — household identifier
bg_geoid — block group GEOID
race_* — binary race columns (see regex above) Rows: one per person. Keys: (person_id). Modified by reference: no (returns copy).

variable_list

data.table. Variable metadata. Required columns:

variable — variable name
description — value label

settings

list. Settings object with configs. Required keys:

acs_year — ACS year
acs_tables[['race']] — ACS table name for race
targets[['p_race']]$levels — target race levels
rng_seed , optional — random seed
outputs_dir , optional — output directory

seed

integer, optional. Random seed for reproducibility. Defaults to rng_seed in settings.

acs_year

integer, optional. ACS year for imputation. Defaults to acs_year in settings.

Value

data.table. Imputed race assignments. Columns:

person_id — person identifier
race — observed race
race_imputed — imputed race Rows: one per person. Keys: (person_id).

Details

Selects race columns using regex ^race_[1-9][0-9]{0,2}$.
Collapses and recodes race categories using regex patterns:
- str_detect(label, "white") → "white"
- str_detect(label, "black|african american") → "afam"
- str_detect(label, "american indian|alaska native") → "native"
- str_detect(label, "asian") → "asian"
- str_detect(label, "native hawaiian|other pacific islander") → "pacific"
- str_detect(label, "hispanic|latino|spanish") → "hispanic"
- str_detect(label, "middle eastern|arab|north african") → "middle_eastern"
- str_detect(label, "other") → "other"
- str_detect(label, "prefer not to answer|don't know") → "missing"
Combines duplicate columns and topcodes to 1 if either is 1.
Collapses categories to match sample targets using regex: str_detect(rl, race) for each target level rl.
Imputes race probabilistically using ACS fractions, or assigns observed group if available.
Returns a data.table with person ID, observed race, and imputed race.
Assumes input tables and settings are complete; errors if required columns or labels are missing.

Settings

acs_year (direct): ACS year for race table. Default from settings.
acs_tables[['race']] (direct): ACS table name for race. Default from settings.
targets[['p_race']]$levels (direct): Target race levels for collapsing categories.
rng_seed (direct): random seed for reproducibility. Default from settings.
outputs_dir (direct): output directory for saving imputations. Default from settings.

Examples

## Not run:
impute_race(persons, variable_list, settings)
#> Error: object 'settings' not found
## End(Not run)