Skip to contents

Imputes gender for persons reporting neither male nor female using Monte Carlo sampling from sample age/gender distribution. Use for cleaning survey data with ambiguous or missing gender.

Usage

impute_gender(
  persons,
  value_labels,
  report_dir = NULL,
  outputs_dir = NULL,
  seed = NULL,
  acs_year = NULL,
  settings
)

Arguments

persons

data.table. Person records to impute.

value_labels

data.table. Value labels for age and gender.

report_dir

character(1), optional. Directory for report output.

outputs_dir

character(1), optional. Directory for imputation output.

seed

integer(1), optional. RNG seed for reproducibility.

acs_year

integer(1), optional. ACS year for reference.

settings

list. Settings object with configs.

Value

data.table. Person IDs and imputed gender.

Details

  • Labels age and gender using value labels.

  • Computes age-by-gender proportions from sample.

  • Fills missing proportions with average share.

  • Assigns gender by Monte Carlo sampling using proportions.

  • Returns data.table with imputed gender for each person.

  • Assumes input is a data.table and value labels are complete.

Settings

  • acs_year (direct): ACS year for reference.

  • rng_seed (direct): random seed for reproducibility.

Examples

## Not run:
impute_gender(persons, value_labels, seed = 123)
#> Error in impute_gender(persons, value_labels, seed = 123): argument "settings" is missing, with no default