Convert probability columns to binary indicators (max selection)
make_binary.RdConverts a set of probability columns to binary indicators by selecting the maximum value in each row. Intended for multinomial imputation or categorical assignment from probabilities. This function is experimental and not ready for production use.
Arguments
- data
data.table. Input with probability columns to convert.
Required columns: all matching
grp_pattern(e.g.,f_a,f_b, ...)Other columns: must include
id_var(character or integer, unique row ID)Rows: one per observation.
Modified by reference: no (returns copy).
- grp_pattern
character(1). Regex pattern matching columns to convert (e.g., 'f_').
- id_var
character(1). Name of ID column for row identification.
Value
data.table. Copy of input with binary indicator columns.
Columns: all original columns except replaced probability columns (now binary indicators)
Each binary column: 1 for max probability, 0 for others
Row order preserved
Details
Identifies columns matching a regex pattern (e.g., '^f_').
Uses:
str_subset(names(data), paste0('^', grp_pattern))Example:
grp_pattern = 'f_'matches columns likef_a,f_b, etc.
For each row, sets 1 for the column with the maximum probability, 0 for others.
Returns a copy of input with binary columns replacing probabilities.
Checks that row sums equal 1 (within tolerance).
Assumes input is a
data.table; does not modify by reference.Placeholder for future implementation using probabilistic sampling (see commented code).
If called, function halts with
stop("Reimplement using sample").Example regex patterns:
^f_— selects columns starting with 'f_'.Used for multinomial assignment from probability columns.
See also
make_probs_binary
Other imputation utilities:
calculate_acs_proportions(),
get_acs_ethnicity(),
get_acs_race(),
get_hh_person_sums(),
impute_ethnicity(),
impute_gender(),
impute_income_nonrelatives(),
impute_income_pnta(),
impute_race(),
prep_hhs_for_income_imputation(),
prepare_acs_income(),
prepare_ethnicity_labels(),
prepare_ethnicity_survey_data(),
prepare_impute_targets(),
prepare_income_fit_dt(),
prepare_persons_dt(),
update_hh_income_imputed()
Examples
## Not run:
dt <- data.table(id = 1:3, f_a = c(0.2, 0.5, 0.3), f_b = c(0.8, 0.5, 0.7))
make_binary(dt, 'f_', 'id')
#> Error in make_binary(dt, "f_", "id"): could not find function "make_binary"
## End(Not run)