Data Analysis by Python

About data analysis by Python

Data analysis by Python is, data science for environmental and quality is what made as a spin-out from. Data analysis by R and data analysis by Excel is the sister version of.

The motives and editorial policies I created are similar to those found in Data Analysis with R. The following are the main differences in the Python version.

Author data science for the environment and the quality was started to make is 2007, but by the time the still, Python did not know. I thought that R was the only free software based on programming languages ??that could be used for data science.

For that reason, most of the data science sample code for environment and quality is written in R.

But now Python is a big part of it. Therefore, I decided to summarize it in the data analysis by Python .

Difference from R version

Exploratory data analysis wanted to be able to do much the same thing in the Python version as data analysis in R, but it's almost the same only in the page for visualizing the entire data in Python and analyzing hidden variables in Python . Other pages have little content, if any.

The reason is that the content of the Python version is small, but the first is my lack of competence.

The second reason is that Python has a higher threshold than R. The methods used on this site are all famous, so I feel like I can translate from R. However, Python is more difficult to add packages than R, and there are many environment-dependent errors, so I thought that it would be useless sample code unless you are quite familiar with Python.

As of 2020, I think the second reason is a big problem with Python, so I decided to limit the translation from the R version.

How to use the Python version

As for the Python version, I use it as a relatively easy way to try it when playing with data in the Python environment.

If you want to do exploratory data analysis or verifiable data analysis in earnest, I think it is more efficient to prepare a dataset and use R for data analysis .

Sample code editing policy

Input data location

It is assumed that the input data is in a folder called "PyTest" on the C drive. It is designed to set up a working directory so that you can do that. If you create a folder with this name in advance and put the file you want to analyze, you can use the sample code without any changes.

By default, it is the directory where the Python files are saved, but it is confusing, so I do this.

Input data file format

The input data is assumed to be a csv file. If you prepare it in Excel, you need to make it so that the first column is column A. Also, the top row is assumed to be the variable name (column name).

"Csv file" can be created by selecting the save format when saving in Excel.

If you are accustomed to programming, it may be difficult to do with csv, and you may feel the limit. In my experience, I think this is the lowest threshold for those who analyze Excel data with statistical software such as Minitab or Statworks. It is in this format.