Top Page | Upper Page | Contents | About This Site | JAPANESE

How to proceed with data utilization

I often hear that data science and artificial intelligence (AI) are expected as methods, starting with "There is a lot of data. I want to utilize this."

It seems that the image of "utilization using a model" in the table below is common, but there are other things as well. Below, I've summarized the other people.

What kind of data is "a lot of data"?

It seems that the perception of "a lot of data" is often misunderstood or over-expected. No matter how much data you have, it can be of no use.

What was it used for until now?

As far as my experience is concerned, what is called "a lot of data" is usually collected for some reason. For this reason, it has already been used somewhere, but there are many people who have suggested "Let's use it!" Or who do not know it.

In such a case, starting from the question "What is the data for?" And deepening the understanding of the meaning of the data may lead to ideas for new uses. In addition, the original utilization may be improved by reviewing the relationship between the reason why the data is collected and the actual data.

Is it possible that it may not be many?

For example, suppose you have data for three times of a phenomenon that takes three months. If the data was recorded at 1-hour intervals, it would be a reasonable amount of data, but the phenomenon is only 3 times. Depending on what you want to handle, it can be said that there are only three data.

In the case of sample data of data analysis software, if there are 10 lines of data, there is no problem in treating it as "n = 10". However, for example, in sensor data , the number of rows and the number of n may not match. The number of data and the number of phenomena represented by the data are different.

In this case, there is an approach for 3 times as data that scientifically analyzes the phenomenon itself instead of statistics, or an approach that collects more and increases it to 5 or 10 times.

Is the model useful?

As you can see on the data science work page, it seems that there are many explanations that "data analysis is the creation of machine learning and mathematical models."

Therefore, as a reason why "AI introduction project", "machine learning project", "data analysis project", etc. do not move forward easily or lead to results, "use a mathematical model such as machine learning". The author speculates that there are many cases in which the idea of ??"going on" has been taken.

If you stick to modeling, you will get stuck when things go wrong. However, if you expand your horizons beyond the model, you may be able to utilize the data. For example, "view only the data at a certain timing". Even when thinking this way, the point is the difference between the number of data and the number of phenomena represented by the data.

If not from "there is data"

The above was the case of "I have data. What should I do?"

In the case of "I have a problem I want to solve or a problem I want to achieve. What should I do now?", The way to proceed with the utilization of data will change.

There is a detailed story on the Data Science page for problem solving and problem achievement, but in this case, it is a key point of the procedure for problem solving and problem achievement, and it is a way to utilize data.

At this point, you may need something new instead of the kind of data you already have. By the way, it doesn't necessarily require an expensive investment to collect new things. For example, simply saying "I measured the humidity in various weather conditions with a home hygrometer" can lead to the success of the theme.

It also leads to the question "Is the model useful?", But the idea of ??"I don't have it now, but I need it" will include the quality and quantity of the data. Knowledge of statistics and design of experiments is useful when considering quality and quantity .

NEXT Data Science Jobs