Top Page | Upper Page | Contents | About This Site | JAPANESE


Multicollinearity is the relationship among X variables. It is one of the difficulties of data analysis.

What is Multicollinearity?

"There is multicollinearity" means "There is strong relationship among X variables."

It means that those variables are same mathematically.

Easy Multicollinearity

Check of multicollinearity by scatter matrix

The easy style of multicollinearity is same to correlation between two variables. If there is strong correlation, it also means that there is strong multicollinearity.

We can check the relationship by scatter plot and correlation coefficient.

General Multicollinearity

Correlation is not the only content of multicollinearity.

In my experience, I need the knowledge of general multicollinearity when I use the dummy variable . I do not need general multicollinearity in my business analysis.

If one of the X variables can be formulated by linear function (regression function) of other X variables, it means that there is multicollinearity. Correlation is the easiest of this function.

"Tolerance" is the check point of general multicollinearity. If tolerance is small, there is multicollinearity.

Multicollinearity is disliked

If there is multicollinearity in the data set, we cannot get good function. So many people dislike the multicollinearity.

The Way to Use Multicollinearity in the Analysis of Cause-and-Effect

I meet often the case that "There is X variables related strongly to Y. And there are strong multicollinearity among those X variables" in my cause-and-effect analysis.

After the study of multi-regression analysis , we might to go the selection of variance and remove many variables in the analysis. But such selection often fails because real data has various backgrounds.

But, in my analysis, when I understand such data set of multicollinearity X variables, the data analysis is finished. After the analysis, I consider the common cause of the variables. "There is X variables related strongly to Y. And there is strong multicollinearity among those X variables"

If the cause is same to one of the X data, we use path analysis for numerical study.

If we want to analyze to make formulation, we dislike multicollinearity. But, like Data Mining , if we want to study cause-and-effect, multicollinearity is important information.

NEXT Principal Component Regression Analysis