Top Page | Upper Page | Contents | About This Site | JAPANESE

Analysis Using Category Data

There are two types of data, quantity data and category data. If data types are different, analysis methods are also different. But quantification methods analyze category data by the methods of quantity data.

By the way, it is minor that to analyze quantity data by the methods of category data. But if we need speed and easy understandable analysis for big data, this approach is strong.

Decision Tree

Decision Tree is a method that it is changed into category data if X is quantity data.

The range, "from ... to ...", is used as the category.

It is a robust approach for outlier data because it does not use quantity information so much.

The Case Including Quantity and Category in A Variable

In real data analysis, there is a case that one variable includes both quantity and category data. If we want to analyze with both type, we will go in trouble.

In this case, we can use the idea of decision tree that we change quantity data into category data. The missing values are also changed into the category "missing value."

Usefulness to Use the Analysis of Category Data

For example, if we can make the formulation,
Y = A * X + B
by regression analysis, we can do precise judge and action.

But in many cases in the real analysis, we cannot make such beautiful formulation because of many reasons.

By the way, in many cases, I do not need such formulation. If I can get the information like "If X is about ..., Y is about ...", it is enough for the purpose of the analysis.

For such purpose, it is smooth that quantity data is changed into category data.

The first reason is that the analysis is robust for the various backgrounds of the data. This strong point is useful especially for big data. (Robust Analysis)

The second reason is that discussion of the output of the analysis is easier. If the analysis is done by the quantity data, some small background of the data affects on a large scale to the output of the analysis. And important information of the analysis is not found by the small.

Methods of Category Data for Quantity Data

Methods of category data can use the information of "quantity" roughly.

If there are much data but the variables are a few (many colums but a few rows), the change that from quantity data to category data, could be done by hand-made with Excel etc.

After the change, if the Y is quantity data and X is category data, ANOVA is a method for the analysis.

If there are many variables, decision tree is an approach.

Associations analysis is also the approach. The free software, " Natto " includes the function to change quantity data into category data.

Statistical Way of Making Hypothesis

Selection of Methods

Prediction by Statistical Model

Outlier and Missing Value

NEXT One-dimensional clustering