There are some methods to deal with the data with Outlier and Missing Value.
This page is written for missing values.
We can use the outlier as normal data. But this approach often leads bad outputs. So use outlier as missing value is a solution.
In my experience, many software of statistics remove the line including missing values without any notice to the user.
It is difficult to notice the bad effect of missing values because this approach is the default in much software.
Strength : Very easy
Weakness : Removed also normal data
It is the method to use new data as missing data. We need to decide the rule to make the data.
Strength : Easy. Effect for outputs leaded from normal data is small.
Weakness : Could be a bias
If we know the range of the data, limits of the range or outlier can be used to fill up.
Strength : Outputs could be include the causes of missing value
Weakness : Effect of the filled up data may be large
Using the average near the missing value by k-NN.
Strength : If the cause of the missing is not special, it may be the best in the filling up approaches.
Weakness : The volume of calculation is not easy
EM Algorithm is the approach to use all normal values but missing values.
Strength : Not use the change of the value
Weakness : The cause of the missing is ignored. It may lead the bad effect for the output of the analysis.
Analysis Using Category Data (Decision Tree, Associations Analysis etc.) is useful. We use the missing value as the category, "missing value".
DecisionTree in RapidMiner use the missing value as the category "?".
Natto use the category, " " (blank) as the missing value.
Strength : Information of missing data can be used
Weakness : The small number of the normal data is not used in the analysis. (Significant figures changes into large.)
Example code of R in the page, Analysis with missing values by R.
NEXT Effective dimension numberTweet