Factorize a dataframe. The function loops over a dataframe (calling factorize_column) and labels each variable for which you provide labels.
Arguments
- df
A dataframe to label
- vals_df
A dataframe of variable labels (i.e., factor levels and labels) with the format as specified below. Passed to factorize_column function.
- verbose
Prints which vars are labeled and unlabeled
- ...
Additional arguments passed to
factorize_column
Details
The function expects a values dataframe (vals_df) in the following format: variable (the character/string names of each variable), value (the integer values for each variable), val_order (the sequential ordering of each value), label (the strings or names to use in the levels of the factor).
The "factorize" functions were borrowed and updated from the 'tmr.Rite.out.tester' package by Matt Landis.
Examples
hh_labeled = factorize_df(
df = hh,
vals_df = value_labels,
value_label_colname = "label",
extra_labels = c("Missing")
)
#> Warning: Missing labels in variable "num_people". Values missing labels: 0
#>
#> Labeled vars:
#> - home_county
#> - income_detailed
#> - income_followup
#> - num_people
#> - residence_type
#> - sample_segment
#> Unlabeled vars:
#> - hh_id
#> - hh_weight
#> - home_lat
#> - home_lon
#> - num_trips