Top Page | Upper Page | Contents | About This Site | JAPANESE

Cause and Effect Analysis of Outlier and Missing Value

Outlier Missing Value

If there are Outlier or Missing Value, Analysis with Outlier and Missing Value is useful. But in many cases, we need the causes of those values to get high quality outputs.

If we know causes, we can plan sampling to avoid those values.

When we want to study the phenomena of outlier and missing value, we do not remove them from data set. But method of removal is useful to select these values and to study these values numerically.

Statistics is not useful in the main part of the analysis generally. The main part is the analysis with meta knowledge .

Statistical Analysis

By Discriminant Analysis

If the causes can be understand the relationship between other variables, the approach of Discriminant Analysis is useful.

Because of the mechanism of outlier and missing value, MT method, decision tree, is better than original Discriminant Analysis.

As Category Data

By using outlier or missing value as the category, Analysis Using Category Data can be used to analyze the causes.

Decision Tree and Associations Analysis are useful.

Outlier as Missing Value, Missing Value as Outlier

Missing Value as Outlier

If there are missing values, some people use outlier value.

In this case, for example, 99999 or 1000000 are used. So we can recognize them as strange values.

Outlier as Missing Value

Some machines or systems deals with outlier as missing value.

Analysis of Outlier

For example, there is data of temperature of refrigerators. Main of the data is from 3C to 7C.

We can analyze the refrigerator because it is a daily tool. But if we have to study the temperature in a strange machine, we do not know the normal temperature. And if we have to study some special indexes, we do not have idea to analyze.

We need help of the professional of the machine. Or we need to understand the mechanism of the machine.

Analysis of Missing Value

There is a case that the database system records missing value when it does not receive the data from measurement machine.

This analysis is similar to the case that there are some messages like "Invalid" on the space where there is numerical value.




NEXT Analysis with Outlier and Missing Value

Tweet