Skip to contents

Imputes missing or ambiguous ethnicity values in survey data by probabilistic sampling from ACS reference distributions at block group or tract level. Use to harmonize survey ethnicity with ACS targets for weighting and analysis.

Usage

impute_ethnicity(
  persons,
  variable_list,
  settings,
  outputs_dir = NULL,
  seed = NULL,
  acs_year = NULL
)

Arguments

persons

data.table. Survey persons table. Required columns:

  • person_id — unique person identifier

  • hh_id — household identifier

  • bg_geoid — block group GEOID

  • ethnicity columns as defined in variable_list Rows: one per person. Keys: (person_id). Modified by reference: no (returns copy).

variable_list

data.table. Variable metadata for ethnicity columns. Required columns:

  • variable — column name in persons

  • description — label or description

settings

list. Settings object with ACS configs and targets. Required keys:

  • targets[['p_ethnicity']]$levels — ethnicity levels

  • acs_year — ACS year

  • rng_seed , optional — random seed

  • outputs_dir , optional — output directory

outputs_dir

character(1), optional. Directory to save imputed results. Default NULL.

seed

integer(1), optional. Random seed for reproducibility. Default from settings or NULL.

acs_year

integer(1), optional. ACS year for reference data. Default from settings or NULL.

Value

data.table. Imputed ethnicity for survey persons. Columns:

  • person_id — unique person identifier

  • ethnicity — original ethnicity label

  • ethnicity_imputed — imputed ethnicity label Rows: one per person. Keys: (person_id).

Details

  • Determines ethnicity levels from settings or defaults; supports binary and multi-level imputation.

  • Prepares ethnicity labels and survey data, handling missing and multi-response cases.

  • Fetches ACS reference data and computes proportions for each geography.

  • Merges ACS proportions to survey persons by block group or tract.

  • Imputes ethnicity by sampling from ACS probabilities; falls back to county-level ACS if local data is missing.

  • Saves imputed results to outputs_dir if provided.

  • Returns a data.table with person_id, original, and imputed ethnicity.

  • Assumes valid settings, variable_list, and persons data.table; errors if levels do not match expected values.

Settings

  • targets[['p_ethnicity']]$levels (direct): ethnicity levels to impute. Example: c('hispanic', 'not_hispanic').

  • acs_year (direct): ACS year for reference data. Example: 2021.

  • rng_seed (direct): random seed for reproducibility. Example: 1234.

  • outputs_dir (direct): directory to save results. Example: 'outputs/'.

Examples

## Not run:
impute_ethnicity(persons, variable_list, settings)
#> Error: object 'settings' not found
## End(Not run)