Top Page | Upper Page | Contents | About This Site | JAPANESE

Multi-Variable Analysis

For example, it is often used when you want to investigate some tendency from data with many items (variates) such as height, weight, age, gender, birthplace, strong subject, weak subject, sleeping time, etc. of each individual. It is a variable analysis.

Multivariate analysis is a way to get a rough idea of ??what your data looks like. In data science , it is an introductory technique when you want to investigate the relationship between multiple variables.

It is not a method of finding local characteristics such as "There are subjects that only people in a certain height range are good at" like data mining . Also, I am not good at analyzing phenomena that change over time ( time series analysis ).

Data mining by multivariate analysis can be done by dividing the height and time into several ranges and examining the characteristics of each range, or by performing " stratification " such as analyzing by gender. You may be able to get results like this.

Analysis of multivariate vs analysis of multiple bivariates

For example, in multiple regression analysis , multiple explanatory variables X ( features ) are aggregated by one formula, and the relationship between the aggregated value and the objective variable Y is investigated.

On the other hand, you can also use correlation analysis to find out the bivariate relationship between Y and each X.

The difference between these two approaches is that the effect may not be particularly different depending on the phenomenon you are dealing with. If you have a lot of variables, you can't always use multivariates.

The advantage of univariate-to-multivariate analysis is that you can find data features that you can never understand just by looking at the bivariate relationship, as in the improvement of discriminating ability by double measurement . It is also easy to handle when used in a system as machine learning . The disadvantage is that the quirks and characteristics of individual Xs are confusing.

The advantages and disadvantages of multiple bivariate analysis are the opposite of univariate vs. multivariate analysis.

When determining outliers and making causal inferences , use both approaches, considering their strengths and weaknesses.

Types of multivariate analysis

This site classifies methods for examining variable relationships and methods for using those relationships for prediction and simulation as methods for multivariate analysis.

Regression analysis : "Quantitative univariate" vs. "multivariate" analysis. Simple regression analysis , multiple regression analysis , path analysis , generalized linear mixed model , Gaussian process regression analysis, etc.
Label classification : "Qualitative univariate" vs. "multivariate" analysis Discriminant analysis , logistic regression analysis , support vector machine , MT method, etc.
Analysis of variable grouping : Look at the similarity of variables without distinguishing between objective variables and explanatory variables. Graphical Lasso , Principal Component Analysis , Log-Linear Analysis, etc.
Factor analysis : Search for unmeasured data. , independent component analysis, etc.

I don't think the MT method is generally described as multivariate analysis. This site is classified as multivariate analysis in consideration of its connection with other fields.

Multidimensional scaling and quantification theory is, but it may be classified as a kind of multivariate analysis, and the idea of this site, data mining because it is for the use of as, on this site, data mining in We are classifying.

Regression analysis and pattern recognition include decision trees , k-nearest neighbors , and neural networks , but these are categorized as data mining and artificial intelligence (AI) because they do not use variable relationships as clues .

Data for analysis starts from the point of creation

Multivariate analysis and data mining techniques are techniques for analyzing tabular data. With relational database data, it's fairly easy to prepare such data, but some data isn't.

In my experience, there is often a limit to what can be obtained from easily prepared tabular data.

On the other hand, when data literacy is used to link different databases and information sources, the resulting multivariate data can yield a large yield.

NEXT Homogeneity Analysis