Skip to contents

Converts a set of probability columns to binary indicators, using either random sampling or max selection. Use for multinomial imputation or categorical assignment from probabilities.

Usage

make_probs_binary(data, grp_pattern, id_var, sample = TRUE, seed = 1968)

Arguments

data

data.table. Input with probability columns to convert.

grp_pattern

character(1). Regex pattern matching columns to convert.

id_var

character(1). Name of ID column for row identification.

sample

logical(1). Use sampling (TRUE) or max selection (FALSE).

seed

integer(1), optional. RNG seed for reproducibility.

Value

data.table. Copy of input with binary indicator columns.

Details

  • Identifies columns matching a pattern (e.g., '^p_').

  • For each row, sets 1 for sampled or max column, 0 for others.

  • If sample=TRUE, uses weighted random sampling (with seed).

  • If sample=FALSE, selects max probability (ties pick first).

  • Returns a copy of input with binary columns replacing probabilities.

  • Checks that row sums equal 1 (within tolerance).

  • Assumes input is a data.table; does not modify by reference.

Examples

if (FALSE) { # \dontrun{
data = fread(
  "
  id   |    p_one    | p_two   | p_three 
  1    |    0.1      | 0.2     | 0.7
  2    |    0.2      | 0.4     | 0.4
  3    |    0.3      | 0.5     | 0.2
  4    |    0.333    | 0.333   | 0.334
  5    |    0.001    | 0.998   | 0.001
  "
)
make_probs_binary(data, 'p_', 'id', sample = FALSE)
make_probs_binary(data, 'p_', 'id', seed = NULL)
} # }