Impute ethnicity for survey persons using ACS
impute_ethnicity.RdImputes missing or ambiguous ethnicity values in survey data by probabilistic sampling from ACS reference distributions at block group or tract level. Use to harmonize survey ethnicity with ACS targets for weighting and analysis.
Usage
impute_ethnicity(
persons,
variable_list,
settings,
outputs_dir = NULL,
seed = NULL,
acs_year = NULL
)Arguments
- persons
data.table. Survey persons table. Required columns:
person_id— unique person identifier hh_id— household identifier bg_geoid— block group GEOID ethnicity columns as defined in variable_list Rows: one per person. Keys: (
person_id). Modified by reference: no (returns copy).
- variable_list
data.table. Variable metadata for ethnicity columns. Required columns:
variable— column name in persons description— label or description
- settings
list. Settings object with ACS configs and targets. Required keys:
targets[['p_ethnicity']]$levels— ethnicity levels acs_year— ACS year rng_seed, optional — random seed outputs_dir, optional — output directory
- outputs_dir
character(1), optional. Directory to save imputed results. Default NULL.
- seed
integer(1), optional. Random seed for reproducibility. Default from settings or NULL.
- acs_year
integer(1), optional. ACS year for reference data. Default from settings or NULL.
Value
data.table. Imputed ethnicity for survey persons. Columns:
person_id— unique person identifier ethnicity— original ethnicity label ethnicity_imputed— imputed ethnicity label Rows: one per person. Keys: ( person_id).
Details
Determines ethnicity levels from settings or defaults; supports binary and multi-level imputation.
Prepares ethnicity labels and survey data, handling missing and multi-response cases.
Fetches ACS reference data and computes proportions for each geography.
Merges ACS proportions to survey persons by block group or tract.
Imputes ethnicity by sampling from ACS probabilities; falls back to county-level ACS if local data is missing.
Saves imputed results to outputs_dir if provided.
Returns a data.table with person_id, original, and imputed ethnicity.
Assumes valid settings, variable_list, and persons data.table; errors if levels do not match expected values.
Settings
targets[['p_ethnicity']]$levels(direct): ethnicity levels to impute. Example: c('hispanic', 'not_hispanic').acs_year(direct): ACS year for reference data. Example: 2021.rng_seed(direct): random seed for reproducibility. Example: 1234.outputs_dir(direct): directory to save results. Example: 'outputs/'.
See also
get_acs_ethnicity, calculate_acs_proportions,
prepare_ethnicity_survey_data, prepare_ethnicity_labels
Other imputation utilities:
calculate_acs_proportions(),
get_acs_ethnicity(),
get_acs_race(),
get_hh_person_sums(),
impute_gender(),
impute_income_nonrelatives(),
impute_income_pnta(),
impute_race(),
make_binary(),
prep_hhs_for_income_imputation(),
prepare_acs_income(),
prepare_ethnicity_labels(),
prepare_ethnicity_survey_data(),
prepare_impute_targets(),
prepare_income_fit_dt(),
prepare_persons_dt(),
update_hh_income_imputed()