In general type of Multi-Variable Analysis (M-VA), direct relationship between X and Y is studied by a mathematical model.
Intermediate layers between X and Y are used in some higher level of M-VA.
The process, merit and demerit are useful as knowledge of Data Literacy. So I introduce in this page.
If simple model is difficult to study the relationship between X and Y, intermediate layer models are one of the solutions.
One of the reasons of difficulties is noise. And there are cases that the reason is complicated information.
Intermediate layer shine the useful information.
The model including all of Xs, Y and Zs is made by one step.
There is an example in the methods of Neural Network.
At first step, Zs are made. At second step, relationship between Zs and Y are studied.
At first step, unsupervised learning is used. At second step, supervised learning is used.
Principal Component Regression Analysis (PCR) is one of the examples. At first step, Principal Component Analysis (PCA) is used. At second step, Multi-Regression Analysis (MRA) is used. Principal components are used as Zs.
For the complicated data, Self Organizing Map and Kernel method are useful.
Classifying methods make categorical data. It is also used as Zs. And it is often used by the dummy variable.
Statistics software helps us to make the intermediate layer. But it is not so strong.
One of the reasons is that it includes only statistical models. And it cannot deal with the knowledge of the background of the data.
I often make intermediate layers with Meta knowledge of the data. Models of in physics, chemistry and so on are used to make Zs.
There are weak points in the analysis using intermediate layer.
Information in the noise is lost when Zs are made. So if important information is included in the noise, we cannot analyze the relationship between important information and Y.
For example, if useful information is included the fact of outlier and missing value, the loss happens. In process analysis for abnormal condition, it often happens.
If relationship between X and Y is explained directly, it is better. The simple story is easy to understand.
And if there is the intermediate layer (Z), we need to plan the action for "(1) X and Z", "(2) Z and Y" and "(3) combination of (1) and (2)".
Gap between Models of Statistics and Real
Statistical Way of Making Hypothesis