Impute race for survey persons
impute_race.RdImputes race for survey persons using ACS reference data and survey race columns, supporting custom and collapsed race categories. Uses explicit regex patterns for column selection and category matching. Use for cleaning and expanding race data in survey imputation workflows.
Arguments
- persons
data.table. Survey person records. Required columns:
person_id— person identifier hh_id— household identifier bg_geoid— block group GEOID race_*— binary race columns (see regex above) Rows: one per person. Keys: ( person_id). Modified by reference: no (returns copy).
- variable_list
data.table. Variable metadata. Required columns:
variable— variable name description— value label
- settings
list. Settings object with configs. Required keys:
acs_year— ACS year acs_tables[['race']]— ACS table name for race targets[['p_race']]$levels— target race levels rng_seed, optional — random seed outputs_dir, optional — output directory
- seed
integer, optional. Random seed for reproducibility. Defaults to
rng_seedin settings.- acs_year
integer, optional. ACS year for imputation. Defaults to
acs_yearin settings.
Value
data.table. Imputed race assignments. Columns:
person_id— person identifier race— observed race race_imputed— imputed race Rows: one per person. Keys: ( person_id).
Details
Selects race columns using regex
^race_[1-9][0-9]{0,2}$.Collapses and recodes race categories using regex patterns:
str_detect(label, "white")→ "white"str_detect(label, "black|african american")→ "afam"str_detect(label, "american indian|alaska native")→ "native"str_detect(label, "asian")→ "asian"str_detect(label, "native hawaiian|other pacific islander")→ "pacific"str_detect(label, "hispanic|latino|spanish")→ "hispanic"str_detect(label, "middle eastern|arab|north african")→ "middle_eastern"str_detect(label, "other")→ "other"str_detect(label, "prefer not to answer|don't know")→ "missing"
Combines duplicate columns and topcodes to 1 if either is 1.
Collapses categories to match sample targets using regex:
str_detect(rl, race)for each target levelrl.Imputes race probabilistically using ACS fractions, or assigns observed group if available.
Returns a data.table with person ID, observed race, and imputed race.
Assumes input tables and settings are complete; errors if required columns or labels are missing.
Settings
acs_year(direct): ACS year for race table. Default from settings.acs_tables[['race']](direct): ACS table name for race. Default from settings.targets[['p_race']]$levels(direct): Target race levels for collapsing categories.rng_seed(direct): random seed for reproducibility. Default from settings.outputs_dir(direct): output directory for saving imputations. Default from settings.
See also
get_acs_race, fetch_acs
Other imputation utilities:
calculate_acs_proportions(),
get_acs_ethnicity(),
get_acs_race(),
get_hh_person_sums(),
impute_ethnicity(),
impute_gender(),
impute_income_nonrelatives(),
impute_income_pnta(),
make_binary(),
prep_hhs_for_income_imputation(),
prepare_acs_income(),
prepare_ethnicity_labels(),
prepare_ethnicity_survey_data(),
prepare_impute_targets(),
prepare_income_fit_dt(),
prepare_persons_dt(),
update_hh_income_imputed()