Skip to contents

Factorize a dataframe. The function loops over a dataframe (calling factorize_column) and labels each variable for which you provide labels.

Usage

factorize_df(df, vals_df, verbose = TRUE, ...)

Arguments

df

A dataframe to label

vals_df

A dataframe of variable labels (i.e., factor levels and labels) with the format as specified below. Passed to factorize_column function.

verbose

Prints which vars are labeled and unlabeled

...

Additional arguments passed to factorize_column

Value

A factorized (i.e. labeled) version of the dataframe it was passed.

Details

The function expects a values dataframe (vals_df) in the following format: variable (the character/string names of each variable), value (the integer values for each variable), val_order (the sequential ordering of each value), label (the strings or names to use in the levels of the factor).

The "factorize" functions were borrowed and updated from the 'tmr.Rite.out.tester' package by Matt Landis.

Examples


hh_labeled = factorize_df(
  df = hh,
  vals_df = value_labels,
  value_label_colname = "label",
  extra_labels = c("Missing")
)
#> Warning: Missing labels in variable "num_people". Values missing labels: 0
#> 
#>  Labeled vars: 
#> - home_county
#> - income_detailed
#> - income_followup
#> - num_people
#> - residence_type
#> - sample_segment
#> Unlabeled vars: 
#> - hh_id
#> - hh_weight
#> - home_lat
#> - home_lon
#> - num_trips