Skip to contents

Creates a standardized ethnicity target variable for person-level weighting and expansion, using either PUMS or survey input. Use when preparing ethnicity targets for synthetic population or survey analysis.

Usage

prep_target_ethnicity(
  h_data,
  p_data,
  target_name = "p_ethnicity",
  codebook,
  settings
)

Arguments

h_data

data.table. Household-level input (not used, included for interface consistency).

p_data

data.table. Person-level input. Required columns:

  • For PUMS: must include ethnicity label column as specified in settings.

  • For survey: must include ethnicity column as specified in settings. Rows: one per person. Modified by reference: no (returns copy).

target_name

character(1). Name of the target variable to create (default: "p_ethnicity").

codebook

data.table. Codebook for variable mapping; must include ethnicity value and label columns.

settings

list. Project settings; must include targets[[target_name]] with levels and survey_input.

Value

data.table. Copy of person-level input with new target variable column (target_name).

  • Columns: all original plus target_name (factor)

  • Values: standardized ethnicity levels

  • Row order preserved

Details

  • Detects input type (PUMS vs. survey) by presence of SERIALNO column.

  • For PUMS:

    • Uses ethnicity label column, converts to lowercase.

    • Uses regex patterns from hispanic_binary and hispanic_detail to assign target levels:

      • e.g., not (spanish|hispanic|latino) → "not_hispanic"

      • mexican → "mexican"

      • puerto rican → "puerto_rican"

      • cuban → "cuban"

      • dominican → "dominican"

      • else → "other_hispanic"

    • Assigns default target as "hispanic" (binary) or "other_hispanic" (detail).

  • For survey:

    • Uses survey input column as specified in settings (no regrouping currently).

  • Checks that observed levels match expected target levels from settings (symmetric difference).

  • Factors output column to match target levels.

  • Renames output column to target_name (default: p_ethnicity).

  • Returns a copy of the input data.table with the new target variable.

  • Error handling: stops if levels do not match expected values.

Settings

  • targets[["p_ethnicity"]] (direct): must include levels, survey_input.

Examples

## Not run:
prep_target_ethnicity(h_data, p_data, target_name = "p_ethnicity", codebook, settings)
#> Error: object 'p_data' not found
## End(Not run)