The minimum distance method is a good combination of the principal component MT method and the neighbor method .
The output of this method is the distance to the closest sample in unit space . For each sample in unit space , look at the distance to the nearest neighbor sample. The anomaly can be judged by comparing the distribution and maximum value of those distances with the distance between the individual samples in the signal space and the closest sample in the unit space. Minimum distance method
Unlike the method based on the MT method , it does not use the center of the distribution in the unit space, so it can be used even if there are two or more centers or there is no such thing as a center.
The weakness is that it calculates the distance between samples for every combination, so too much data in the unit space can take a very long time to calculate or run out of memory to calculate. In that case, it is necessary to thin out the distribution appropriately while keeping the range of the distribution unchanged.
The advantage over LOF (a kind of neighbor method) is that when there are multiple outliers and they are close to each other, they are not mistaken for normal values. I mentioned the calculation time as a weak point, but it is considerably faster than the LOF.
The "minimum distance method" is the method I thought of. The name was also given by the author. It's a simple method, so I don't think it's strange if there is already the same thing in the world, but I don't know so far. If there is the same thing, I will match the name to that.
The Analysis of Abnormality Quantification by R page has an example of the minimum distance method.
Principal component analysis is included in the data preprocessing for the purpose of producing the desired results and reducing the amount of calculation even when the data has multicollinearity . In addition, even if important information is contained in the principal component with a low contribution rate, there is a procedure called " standardization of the principal component " that is not normally used so that it will not be overlooked .
NEXT Quantification theoryTweet