Analyze Unit of Text

The simple way of Text Mining is to analyze the number of words and the Co-occurrence among words.

The next step is to analyze the relationship among texts.

The point of the approach to analyze texts is that we use texts as the samples of the whole data. By this approach, we can use the methods of Analysis of Similarity of Samples for this analysis.

Difference of Units

A text is made by sentences. A sentence is made by words. "Text" and "sentence" are unit to study the text (sample).

Outputs of analysis is different by the units.

MS Excel has the function to change a text into sentences using ".(period)".

Probability of Text Mining

The figure is the example of text mining data. We analyze Co-occurrence by this matrix.

We do not read texts as the group of words in the daily use. And if we expect to have the methods for automatic translation or communication, we need more technology.

But the technology to use texts as groups of words has the potential to change the world.

