# Cause and Effect Analysis of Outlier and Missing Value

If there are Outlier or Missing Value, Analysis with Outlier and Missing Value is useful. But in many cases, we need the causes of those values to get high quality outputs.

If we know causes, we can plan sampling to avoid those values.

When we want to study the phenomena of outlier and missing value, we do not remove them from data set. But method of removal is useful to select these values and to study these values numerically.

Statistics is not useful in the main part of the analysis generally. The main part is the analysis with meta knowledge .

## Statistical Analysis

### By Discriminant Analysis

If the causes can be understand the relationship between other variables, the approach of Discriminant Analysis is useful.

Because of the mechanism of outlier and missing value, MT method, decision tree, is better than original Discriminant Analysis.

### As Category Data

By using outlier or missing value as the category, Analysis Using Category Data can be used to analyze the causes.

Decision Tree and Associations Analysis are useful.

## Outlier as Missing Value, Missing Value as Outlier

### Missing Value as Outlier

If there are missing values, some people use outlier value.

In this case, for example, 99999 or 1000000 are used. So we can recognize them as strange values.

### Outlier as Missing Value

Some machines or systems deals with outlier as missing value.

## Analysis of Outlier

For example, there is data of temperature of refrigerators. Main of the data is from 3C to 7C.

• The case of 10000C : This temperature is not exist on the earth generally. So this case may be the error of the system.
• The case of 1000C : Maybe the temperature of fire
• The case of 31.5C : Maybe the temperature of the room. Maybe the door of the refrigerator was open. But maybe the truth was 3.15C. The reason is clearer when we study the temperature of the room of the day or the distribution of the data.
• The case of 10C or 0C : These cases happen in the low probability. If the timing is clear, we may understand the reason.

We can analyze the refrigerator because it is a daily tool. But if we have to study the temperature in a strange machine, we do not know the normal temperature. And if we have to study some special indexes, we do not have idea to analyze.

We need help of the professional of the machine. Or we need to understand the mechanism of the machine.

## Analysis of Missing Value

There is a case that the database system records missing value when it does not receive the data from measurement machine.

This analysis is similar to the case that there are some messages like "Invalid" on the space where there is numerical value.