Model of Outlier In the data of above figure, we can find there is outlier easily.

Characteristics of outlier are "Separated from majority", "Out of the range" and "In thin area of density".

We use thse characteristics to make the system to judge outlier.

Use "Separated from majority"

We use the idea of Hypothesis Testing. I we think as Normal Distribution and do Normalization , "Over 3 is outlier" is an example.

Use "Out of the range"

Simple approach. And it can be used for the case that statistical approach is difficult. Example is below. LOF is famous.

For Multi-Variable Case

Basic idea is same to the single case.

Use "Separated from majority"

Principal Component Analysis and MT method are method.

Use "Out of the range"

Principal Component Analysis and MT method are also used. For the complicated range, One-Class SVM is useful.

Use "In thin area of density"

LOF is also used for multi-varaiables.

Two Ways to Use Data

There are two ways to use data to judge the outlier. First is the model is made by the data set including both test data and reference data. Second is that model is made only reference data.

Including Both Test Data and Reference Data.

This way is easier. But if the model is not robust to the outlier value, this way does not go wel.

To improve this weakness, test data including the data set is used one by one.

Smirnov-Grubbs' test judge the most outside data. Reference is the other data.

Only Reference Data

This way is called " One-Class Model ."

NEXT   One-Class Model 