A function for finding outliers in a dataset. Uses Median Absolute Deviation
(MAD) to detect values which are a certain distance away from the median
value. See stats::mad()
for more information.
Arguments
- x
A data.frame with the data
- col
The column to find outliers from
- group_col
The column to group by
- threshold
The threshold value for finding outliers. Outliers are threshold * MAD away from the median.
Value
A modified version of the input data.frame, which includes columns for median, MAD and a logical column to indicate outliers.
Examples
silk_data1 |>
find_outliers("y", group_col = "group") |>
head()
#> time y group .median .mad .outlier
#> 1 1 8.584244 series1 8.065949 2.713192 FALSE
#> 2 2 9.159694 series1 8.065949 2.713192 FALSE
#> 3 3 9.717704 series1 8.065949 2.713192 FALSE
#> 4 4 10.249923 series1 8.065949 2.713192 FALSE
#> 5 5 10.748432 series1 8.065949 2.713192 FALSE
#> 6 6 11.205878 series1 8.065949 2.713192 FALSE