Multicollinearity is the relationship among X variables. It is one of the difficulties of data analysis.
"There is multicollinearity" means "There is strong relationship among X variables."
It means that those variables are same mathematically.
The easy style of multicollinearity is same to correlation between two variables. If there is strong correlation, it also means that there is strong multicollinearity.
We can check the relationship by scatter plot and correlation coefficient.
Correlation is not the only content of multicollinearity.
In my experience, I need the knowledge of general multicollinearity when I use the dummy variable . I do not need general multicollinearity in my business analysis.
If one of the X variables can be formulated by linear function (regression function) of other X variables, it means that there is multicollinearity. Correlation is the easiest of this function.
"Tolerance" is the check point of general multicollinearity. If tolerance is small, there is multicollinearity.
If there is multicollinearity in the data set, we cannot get good function. So many people dislike the multicollinearity.
I meet often the case that "There is X variables related strongly to Y. And there are strong multicollinearity among those X variables" in my cause-and-effect analysis.
After the study of multi-regression analysis , we might to go the selection of variance and remove many variables in the analysis. But such selection often fails because real data has various backgrounds.
But, in my analysis, when I understand such data set of multicollinearity X variables, the data analysis is finished. After the analysis, I consider the common cause of the variables. "There is X variables related strongly to Y. And there is strong multicollinearity among those X variables"
If the cause is same to one of the X data, we use pass analysis for numerical study.
If we want to analyze to make formulation, we dislike multicollinearity. But, like Data Mining , if we want to study cause-and-effect, multicollinearity is important information.
NEXT Principal Component Regression AnalysisTweet