Skip to contents

Imputes race for survey persons using ACS reference data and survey race columns, supporting custom and collapsed race categories. Uses explicit regex patterns for column selection and category matching. Use for cleaning and expanding race data in survey imputation workflows.

Usage

impute_race(persons, variable_list, settings, seed = NULL, acs_year = NULL)

Arguments

persons

data.table. Survey person records. Required columns:

  • person_id — person identifier

  • hh_id — household identifier

  • bg_geoid — block group GEOID

  • race_* — binary race columns (see regex above) Rows: one per person. Keys: (person_id). Modified by reference: no (returns copy).

variable_list

data.table. Variable metadata. Required columns:

  • variable — variable name

  • description — value label

settings

list. Settings object with configs. Required keys:

  • acs_year — ACS year

  • acs_tables[['race']] — ACS table name for race

  • targets[['p_race']]$levels — target race levels

  • rng_seed , optional — random seed

  • outputs_dir , optional — output directory

seed

integer, optional. Random seed for reproducibility. Defaults to rng_seed in settings.

acs_year

integer, optional. ACS year for imputation. Defaults to acs_year in settings.

Value

data.table. Imputed race assignments. Columns:

  • person_id — person identifier

  • race — observed race

  • race_imputed — imputed race Rows: one per person. Keys: (person_id).

Details

  • Selects race columns using regex ^race_[1-9][0-9]{0,2}$.

  • Collapses and recodes race categories using regex patterns:

    • str_detect(label, "white") → "white"

    • str_detect(label, "black|african american") → "afam"

    • str_detect(label, "american indian|alaska native") → "native"

    • str_detect(label, "asian") → "asian"

    • str_detect(label, "native hawaiian|other pacific islander") → "pacific"

    • str_detect(label, "hispanic|latino|spanish") → "hispanic"

    • str_detect(label, "middle eastern|arab|north african") → "middle_eastern"

    • str_detect(label, "other") → "other"

    • str_detect(label, "prefer not to answer|don't know") → "missing"

  • Combines duplicate columns and topcodes to 1 if either is 1.

  • Collapses categories to match sample targets using regex: str_detect(rl, race) for each target level rl.

  • Imputes race probabilistically using ACS fractions, or assigns observed group if available.

  • Returns a data.table with person ID, observed race, and imputed race.

  • Assumes input tables and settings are complete; errors if required columns or labels are missing.

Settings

  • acs_year (direct): ACS year for race table. Default from settings.

  • acs_tables[['race']] (direct): ACS table name for race. Default from settings.

  • targets[['p_race']]$levels (direct): Target race levels for collapsing categories.

  • rng_seed (direct): random seed for reproducibility. Default from settings.

  • outputs_dir (direct): output directory for saving imputations. Default from settings.

Examples

## Not run:
impute_race(persons, variable_list, settings)
#> Error: object 'settings' not found
## End(Not run)