Evaluation of Variable Importance

When searching for Quality Way of Making Hypothesis in Cause and Effect Analysis , we sometimes want to proceed with the analysis of "Which variable has the most influence on the target variable? What is the second?" It is sometimes called "diagnosis" or "item diagnosis".

The importance of variables is also important information for Selection of Variables .

This page is a child page of multiple regression analysis, but the basic idea is the same for other than Multi-Regression Analysis.

Evaluate the importance of variables in the regression equation

Coefficient evaluation and p-value evaluation for each variable on the Multiple Regression Analysis page are types of variable importance evaluation.

Evaluate the importance of variables in a dataset

"Evaluate the importance of variables in the regression equation" is a method that can be applied to the regression equation after it is created. Before creating the regression equation, there are other ways to check whether this variable should be included in the regression equation and whether this variable has a large effect on the objective variable.

For both the ensemble learning method and the design of experiments method, various patterns of regression equations (models) that do not use all variables are created on a trial basis, and each is evaluated. By doing this , you can find variables that are actually important even though the regression equation using all the variables might have been evaluated as unimportant due to the effects of multicollinearity with other variables, etc.

Evaluation by ensemble learning

Ensemble Learning is often explained as a method of creating highly accurate models, but it is also used as a method of evaluating the importance of variables.

In Decision Tree, Random forest use ensemble learning to assess variable importance.

It should be possible to evaluate the importance of variables using ensemble learning in multiple regression analysis models, but I have never done so. Also, I know about the tool for evaluating the importance of variables by random forest , as introduced on the Decision tree by R, but the tool that uses ensemble learning in multiple regression analysis is I have never seen it.

Evaluation by design of experiments

The use of Design of Experiments to assess variable importance is introduced in the MT System of Quality Engineering .

A two-level orthogonal table consists of two values "0 and 1. Regarding these 0 and 1, " 0 = none (the explanatory variable is not included in the model formula)" and " 1 = yes ( the explanatory variable is not included in the model formula)" variables are included in the model formula)".

In the experimental design method, experiments are conducted in order with various combinations of conditions, and similarly, the goodness of the model formula is calculated in order for various combinations of states in which the explanatory variables are selected. After doing the calculations, you can draw a factor-effect diagram to find the explanatory variables that have the greatest impact.