Skip to contents

Cleans and reshapes the survey persons table for ethnicity imputation, applying label mapping, handling missing and multi-response cases, and producing a long-format table with one ethnicity per person.

Usage

prepare_ethnicity_survey_data(
  persons,
  ethnicity_labels,
  is_binary,
  default_eth
)

Arguments

persons

data.table. Survey persons table. Required columns:

  • hh_id — household identifier

  • person_id — unique person identifier

  • bg_geoid — block group GEOID

  • ethnicity columns as defined in ethnicity_labels Rows: one per person. Keys: (person_id). Modified by reference: no (returns copy).

ethnicity_labels

data.table. Ethnicity label mapping. Required columns:

  • variable — column name in persons

  • short_label — mapped label

is_binary

logical(1). If TRUE, only use 'hispanic' and 'not_hispanic' levels. Default FALSE.

default_eth

character(1). Default ethnicity label when none found.

Value

data.table. Survey persons with labeled ethnicity. Columns:

  • hh_id — household identifier

  • person_id — unique person identifier

  • bg_geoid — block group GEOID

  • ethnicity — labeled ethnicity Rows: one per person. Keys: (person_id).

Details

  • Filters persons table to required columns (hh_id, person_id, bg_geoid, ethnicity variables).

  • Replaces missing codes (995) with zero.

  • Sets missing ethnicity if no response selected.

  • Renames ethnicity columns using provided label mapping.

  • For non-binary ethnicity, topcodes multiple responses to default_eth.

  • Melts to long format and summarizes responses per person.

  • For binary ethnicity, ensures only one response per person and prioritizes 'hispanic' over others.

  • Returns a data.table with one row per person and labeled ethnicity column.

  • Assumes valid persons and ethnicity_labels tables; errors if response sums are inconsistent.

Settings

None.

Examples

## Not run:
prepare_ethnicity_survey_data(persons, ethnicity_labels, is_binary = TRUE, default_eth = "hispanic")
#> Error in prepare_ethnicity_survey_data(persons, ethnicity_labels, is_binary = TRUE,     default_eth = "hispanic"): could not find function "prepare_ethnicity_survey_data"
## End(Not run)