There are two types of data, quantity data and category data. If data types are different, analysis methods are also different. But quantification methods analyze category data by the methods of quantity data.
By the way, it is minor that to analyze quantity data by the methods of category data. But if we need speed and easy understandable analysis for big data, this approach is strong.
Decision Tree is a method that it is changed into category data if X is quantity data.
The range, "from ... to ...", is used as the category.
It is a robust approach for outlier data because it does not use quantity information so much.
In real data analysis, there is a case that one variable includes both quantity and category data. If we want to analyze with both type, we will go in trouble.
In this case, we can use the idea of decision tree that we change quantity data into category data. The missing values are also changed into the category "missing value."
For example, if we can make the formulation,
Y = A * X + B
by regression analysis, we can do precise judge and action.
But in many cases in the real analysis, we cannot make such beautiful formulation because of many reasons.
By the way, in many cases, I do not need such formulation. If I can get the information like "If X is about ..., Y is about ...", it is enough for the purpose of the analysis.
For such purpose, it is smooth that quantity data is changed into category data.
The first reason is that the analysis is robust for the various backgrounds of the data. This strong point is useful especially for big data. (Robust Analysis)
The second reason is that discussion of the output of the analysis is easier. If the analysis is done by the quantity data, some small background of the data affects on a large scale to the output of the analysis. And important information of the analysis is not found by the small.
Methods of category data can use the information of "quantity" roughly.
If there are much data but the variables are a few (many colums but a few rows), the change that from quantity data to category data, could be done by hand-made with Excel etc.
After the change, if the Y is quantity data and X is category data, ANOVA is a method for the analysis.
If there are many variables, decision tree is an approach.
Associations analysis is also the approach. The free software, " Natto " includes the function to change quantity data into category data.
Statistical Way of Making Hypothesis
Selection of Methods
Prediction by Statistical Model
Outlier and Missing Value
NEXT One-dimensional clusteringTweet