Remove outliers from a numeric variable
Arguments
- var_dt
Dataset with a numeric variable to remove outliers from in data.table format.
- numvar
Numeric variable to remove outliers from. Default is NULL.
- threshold
Threshold to define what an outlier is. Default is .975.
Examples
require(data.table)
hts_remove_outliers(var_dt = trip, numvar = "speed_mph")
#> Warning: 378 outliers were removed based on the threshold of 0.975.
#> $outlier_description
#> threshold num_removed min_outlier max_outlier
#> 1: 0.975 378 112.9918 228233.1
#>
#> $dt
#> day_id trip_id speed_mph distance_miles mode_type mode_1 mode_2
#> 1: 1 6848 0.3570582 0.07736261 8 6 995
#> 2: 1 6099 3.8030030 0.31691692 8 34 995
#> 3: 1 15759 9.2827577 0.16244826 1 1 995
#> 4: 1 13883 10.7289440 10.72894403 13 2 23
#> 5: 1 9240 1.3936891 0.47308002 2 2 995
#> ---
#> 14718: 4125 4505 16.3377147 3.23577517 8 6 995
#> 14719: 4125 7897 42.9297111 22.65734754 8 6 995
#> 14720: 4125 719 1.5648402 7.77203953 1 1 995
#> 14721: 4125 14260 10.5795319 1.76325532 8 7 995
#> 14722: 4125 4397 8.5320851 1.42201419 8 34 995
#> num_travelers d_purpose_category hh_id person_id travel_date trip_weight
#> 1: 1 7 642 820 2023-05-28 957
#> 2: 2 7 642 820 2023-05-28 237
#> 3: 1 9 642 820 2023-05-28 287
#> 4: 1 11 642 820 2023-05-28 361
#> 5: 1 1 642 820 2023-05-28 578
#> ---
#> 14718: 1 12 876 1684 2023-05-30 999
#> 14719: 1 2 876 1684 2023-05-30 167
#> 14720: 1 12 876 1684 2023-05-30 954
#> 14721: 1 2 876 1684 2023-05-30 841
#> 14722: 2 7 876 1684 2023-05-30 977
#>