Top Page | Upper Page | Contents | About This Site | JAPANESE

Regression model data structure

Y = X + e
is the simplest formula of Regression Analysis

The meaning of this formula is that "Y is the cause of X plus an error."

It's a formula that is used as a matter of course, but in practice, problems occur. This page has been considered by the author through such troubles.

How to extract the structure of Y = X + e from the data

In real-world data, it is common for both Y and X measurements to be inaccurate. However, if the effect of such an error is small, you may be able to use LiNGAM to get a clean
Y = X + e
relational expression.

LiNGAM is a method for Quantity Way of Making Hypothesis by deriving the Structure of data that becomes a directed graph when the structure of data that becomes a regression model .

If Y = X + e applies

When X can be manipulated

If X is the amount you can manipulate and Y is the result, then Y = X + e may be true.

For example, when the exam time is different for each subject, X, and the actual exam time for each class is Y. In this case, there is a real causal relationship and this model should apply.

If X is a true value

To the contrary of the following story, if the data of X is a true value (error is 0), Y = X + e may be applicable.

If your data is biased

If you look at the data from the last three months, you find there are "Y = X + e".

However, in a factory, for example, if you look at a little more historical data, this expression may not apply.

In this example, there are hints to elucidate the causal relationship that there are differences in relationships depending on the time of year. It is not a causal relationship as expressed by a simple formula.

This example is a time bias, but it can also be spatial.

If Y = X + e does not apply

If the formula is completely different

Even if there is a causal relationship, the formula may not apply. In the case of the graph below, the value of Y is determined with a certain value of X as the boundary.

If the data is not a true value

I wrote something similar to the effect of measurement error on regression analysis, but if both X and Y are measurements, there is an error in both.

Even if there is a relationship between true values,

y = x

, if the data we have is X and Y, then

Y = y + ey
X = x + ex

The regression analysis method doesn't ask for the coefficient for X to be "1", but it asks for a value less than "1". This story is about the effect of Measurement Errors in Regression Analysis.

Linking different source data

NEXT Proportional variance model data structure