#
Multicollinearity

Multicollinearity is the relationship among X variables.
It is one of the difficulties of data analysis.

##
What is Multicollinearity?

"There is multicollinearity" means "There is strong relationship among X variables."

It means that those variables are same mathematically.

###
Easy Multicollinearity

The easy style of multicollinearity is same to
correlation
between two variables.
If there is strong correlation, it also means that there is strong multicollinearity.

We can check the relationship by
scatter plot
and
correlation coefficient.

###
General Multicollinearity

Correlation is not the only content of multicollinearity.

In my experience, I need the knowledge of general multicollinearity when I use the
dummy variable
.
I do not need general multicollinearity in my business analysis.

If one of the X variables can be formulated by
linear function
(regression function)
of other X variables, it means that there is multicollinearity.
Correlation is the easiest of this function.

"Tolerance" is the check point of general multicollinearity.
If tolerance is small, there is multicollinearity.

##
Multicollinearity is disliked

If there is multicollinearity in the data set, we cannot get good function.
So many people dislike the multicollinearity.

##
The Way to Use Multicollinearity in the Analysis of Cause-and-Effect

I meet often the case that
"There is X variables related strongly to Y.
And there are strong multicollinearity among those X variables"
in my cause-and-effect analysis.

After the study of
multi-regression analysis
, we might to go the
selection of variance
and remove many variables in the analysis.
But such selection often fails because real data has various backgrounds.

But, in my analysis, when I understand such data set of multicollinearity X variables, the data analysis is finished.
After the analysis, I consider the common cause of the variables.
"There is X variables related strongly to Y.
And there is strong multicollinearity among those X variables"

If the cause is same to one of the X data, we use
path analysis
for numerical study.

If we want to analyze to make formulation, we dislike multicollinearity.
But, like
Data Mining
, if we want to study cause-and-effect, multicollinearity is important information.

NEXT Principal Component Regression Analysis