In the data of above figure, we can find there is
outlier
easily.
Characteristics of outlier are "Separated from majority", "Out of the range" and "In thin area of density".
We use thse characteristics to make the system to judge outlier.
We use the idea of Hypothesis Testing. I we think as Normal Distribution and do Normalization , "Over 3 is outlier" is an example.
Simple approach.
And it can be used for the case that statistical approach is difficult.
Example is below.
LOF is famous.
Basic idea is same to the single case.
Principal Component Analysis and MT method are method.
There is Outlier detection with cluster analysis .
Principal Component Analysis and MT method are also used. For the complicated range, One-Class SVM is useful.
LOF is also used for multi-varaiables.
There are two ways to use data to judge the outlier. First is the model is made by the data set including both test data and reference data. Second is that model is made only reference data.
This way is easier. But if the model is not robust to the outlier value, this way does not go wel.
To improve this weakness, test data including the data set is used one by one.
Smirnov-Grubbs' test judge the most outside data. Reference is the other data.
This way is called " One-Class Model ."
NEXT One-Class Model