📜  boxplot pandas - Python (1)

📅  最后修改于: 2023-12-03 15:29:39.768000             🧑  作者: Mango

Boxplot with Python's Pandas

Boxplot is a type of graph that is used to represent the distribution of numerical data through their quartiles. It is a powerful visualization tool that can display the median, minimum and maximum values, and outliers of a dataset. In this tutorial, we will use Python's Pandas library to create boxplots.

Import Pandas library

Before getting started, let's import the Pandas library.

import pandas as pd
Load the dataset

We will use the pd.read_csv() function to load a dataset called iris.csv into a Pandas dataframe.

iris = pd.read_csv('iris.csv')
Create the boxplot

To create the boxplot, we will call the boxplot() method on the Pandas dataframe. We can specify the column that we want to create the boxplot for using the column parameter.

iris.boxplot(column='sepal_length')

This will create a boxplot for the sepal_length column.

Customize the boxplot

We can also customize the boxplot's appearance by specifying various parameters:

  • by: group the data by a specified column
  • grid: show a grid
  • showfliers: show the outliers
  • whis: set the whisker length
  • sym: set the symbol for the outliers
iris.boxplot(column='sepal_length', by='species', grid=False, showfliers=False, whis=1.5, sym='k.')

This will create a boxplot grouped by the species column, without a grid, without showing the outliers, with a whisker length of 1.5, and with black diamond-shaped symbols for the outliers.

Save the boxplot

We can save the boxplot as an image file using the savefig() method. We can specify the filename and the file format.

ax = iris.boxplot(column='sepal_length', by='species', grid=False, showfliers=False, whis=1.5, sym='k.')
ax.set_ylabel('Sepal Length (cm)') # add a label to the y-axis
plt.savefig('boxplot.png', dpi=300, bbox_inches='tight', pad_inches=0.2) # save the plot as a PNG file

This will save the boxplot as a PNG file called boxplot.png. We specified a DPI of 300, trimmed the white space around the plot using bbox_inches='tight', and added some padding using pad_inches=0.2.

Conclusion

In this tutorial, we learned how to create a boxplot using Python's Pandas library. We saw how to customize the boxplot's appearance and save it as an image file. Boxplots are a powerful visualization tool that can help us understand the distribution of numerical data.