Hypothesis Testing from 21 century

Introductory explanations of statistics often only say, "If you want to find out if there is a difference in the mean value, use the test for the difference in the mean value".

If the explanation is a little more advanced, It has been shown that there is a big problem with making decisions using the p-value. Solutions and alternatives to the problem are also presented, but my impression is that simple problems are difficult problems.

Therefore, as a "test for the 21st century", this page summarizes how to use the test method in a way that I myself can understand.

The basic idea is to use a test of the difference in the mean value when you want to check the difference in the average value. The difference is that the test of the difference in the mean value is positioned as a preliminary survey of what you really want to investigate.

Why until the 20th century, even the testing methods developed by the 20th century did not become a big problem

The test method found in statistics textbooks was devised at a time when computers were not familiar and statistical calculations were performed with paper and pencil. In this era, we only dealt with so-called small data, so there were no problems.

With the spread of personal computers at the end of the 20th century, it became possible to easily calculate on a scale where the number of samples exceeded 30, and this became a big problem.

Procedure for Testing Mean Differences (21st Century Edition)

The test for difference in means is designed as a way to determine whether the means of each of the two groups can be distinguished as a numerical value. In the case of small data, there was no particular problem even if I thought that "distinguishable as a numerical value = it can be said that there is a difference in the average value". In the 21st century and beyond, we need to be aware of our original purpose.

Preliminary survey

The test for difference in means is designed as a way to determine whether the means of each of the two groups can be distinguished as a numerical value. In the case of small data, there was no particular problem even if I thought that "distinguishable as a numerical value = it can be said that there is a difference in the average value". In the 21st century and beyond, we need to be aware of our original purpose.

There has long been a criterion of "p-value of 0.05", and I believe that this standard is good as a guide even after the 21st century. However, this p-value is not a criterion for determining whether there is a difference in the average value, but a criterion for whether the next evaluation can be made.

How to say "there is a difference in the average value"

The appropriate way to evaluate "can we say that there is a difference in the means" is to Test for Normal Distribution Differences.

What this method is doing is investigating how far apart the two groups are. Since the percentage of meaningful separation varies depending on the theme being addressed, there is no absolute criterion for the p-value of this method.

Testing the difference in the normal distribution is only the second step. This is because the test for normal distribution differences assumes that the two means are numerically distinguishable.

Procedure for Testing Differences in Variation (21st Century Edition)

The procedure for testing for differences in variability is the same as described above. The test for the ratio of variances is used to investigate whether they can be distinguished numerically, and the Test for differences in normal distribution variability is used to evaluate differences in variability.

Supplementary analysis from non-statistical perspectives

The above is only the statistical method part. In the data we work with, there are other checks that are necessary.

Checking with graphs

Although it is not limited to tests, when dealing with statistical methods, you can use histograms and box plots to visually check the data of two groups. If you don't combine what you see visually with the results of your calculations to draw conclusions, you may end up with strange conclusions.

Check significant figures and resolution

There is a statistic (the impossibility of statistics) with a minimum confidence interval, but what can be said to be "no difference" in terms of significant figures and resolution may be "different" in statistical calculations.

NEXT Estimation