Top Page | Upper Page | Contents | About This Site | JAPANESE

Analysis of clustering by decision tree

There are two things to worry about when doing cluster analysis .

One is that there is no suitable graph to see the relationship between grouping and data when performing non-hierarchical cluster analysis in 3D or higher. There is a cluster analysis using a self-organizing map as a solution to this problem, but this method is more suitable when the sample size is small. With a large number of samples, calculating a self-organizing map can be overwhelming because it is inherently time consuming. Also, when the number of samples is small, it is convenient to view the self-organizing map as a scatter plot of words, but when the number of samples is large, the graph becomes messy and analysis does not proceed.

Another problem is that cluster analysis using self-organizing maps is difficult because we don't know why the clustering results were so. Since the cluster analysis algorithm creates clusters exploratoryly, I know what kind of cluster it will be as a result, but I do not know "why it becomes this cluster".

Decision trees are useful as a solution to these two problems . Note that the decision tree method does not allow you to see individual samples in detail, so it is complementary to cluster analysis using self-organizing maps .

Use of decision trees

The decision tree is executed with the result of the cluster analysis (cluster number, etc.) as the objective variable and the data used in the cluster analysis as the explanatory variable .

Then, you can see how the clusters are divided from the result of the decision tree.

The decision tree takes advantage of the characteristics that "the objective variable is a qualitative variable and can be multi-class" and "the analysis result is descriptive".

Software

Examples of R is in the page, Cluster analysis by R .

NEXT Outlier detection with cluster analysis