Prediction intervals

Prediction using statistics is not so much treated in the field of statistics, but it has become a central method in the field of Machine Learning.

In machine learning, predicted values for point estimation are often used, but there are ways to predict them as intervals.

By the way, "predict" in statistics does not necessarily mean estimation of future data.

Prediction intervals

In the figure, the points marked "raw data" are the values of individual samples. This begs the question, "What will be the next sample?" This value is important if you want to make future predictions.

The prediction interval answers that question. A prediction interval is an interval in which an individual sample is considered to have a value with some probability. In the figure, I tried to draw a case with a probability of 95%.

How to use the prediction interval

If the new sample values fall into this interval, the expected result is expected. If it is outside the interval, it may be data in an abnormal state.

Prediction intervals can be used as a way to Model of Outlier and Model of Abnormal but not Outlier.

When making a judgment, a numerical value (threshold) is required as a guide. You can use the prediction interval as the basis for that number.

Features of the prediction interval

There are characteristics in the way in which the width of the prediction interval is determined.

• The smaller the standard deviation, the narrower the interval.
• The lower the probability of trying to guess the interval, the narrower the interval.
• The larger the number of samples, the narrower the interval.

When I try to use the prediction interval in practice, I think, "If it is so wide, I can't rely on it, so I can't use it for a while. . ." There is such a thing. To narrow the interval, consider these properties.

If you simply want to narrow the appearance of the interval, you can play around with the probabilities. If you are thinking about practical purposes, you can increase the sample (also called "increasing the n number"). You need to investigate the cause of the magnitude of the standard deviation (variability in the data) to reduce the standard deviation.

Software

R

Example of R can be found in Analysis of prediction interval by R.

R-EDA1

R-EDA1 can also do it.