📜  Pandas DataFrame.hist()(1)

📅  最后修改于: 2023-12-03 15:03:28.205000             🧑  作者: Mango

Pandas DataFrame.hist()

The hist() function in Pandas DataFrame provides an easy way to visualize the distribution of data in a DataFrame. It generates histograms of the columns of a DataFrame, showing the count of observations that fall within each bin.

Syntax
DataFrame.hist(column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, bins=10, **kwargs)
Parameters
  • column: The column name(s) of the DataFrame to be plotted. If not specified, all numerical columns will be plotted.

  • by: Group the data by a categorical column and display the histograms for each group.

  • grid: Display the grid on the plot or not. Default: True.

  • xlabelsize: The size of the x-axis label font. Default: None.

  • xrot: The rotation degree of the x-axis labels. Default: None.

  • ylabelsize: The size of the y-axis label font. Default: None.

  • yrot: The rotation degree of the y-axis labels. Default: None.

  • ax: The matplotlib.pyplot.axes object to draw the plot onto. Default: None (creates a new figure with a default size).

  • sharex: Share the x-axis among subplots or not. Default: False.

  • sharey: Share the y-axis among subplots or not. Default: False.

  • figsize: The size of the figure as a tuple of (width, height) in inches. Default: (6.4, 4.8).

  • layout: The number of rows and columns of the subplot grid. Default: None.

  • bins: The number of histogram bins to be used. Default: 10.

  • **kwargs: Other parameters passed to the underlying hist() method in matplotlib.

Returns
  • np.ndarray or list of np.ndarray: The values of the histogram bins.

  • list of matplotlib.artist.Artist: The corresponding list of artists for each histogram (bars, patches, etc.).

Examples

Let's create a DataFrame first:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(1000, 4), columns=['A', 'B', 'C', 'D'])

Now, we can visualize the distribution of each column using simple syntax:

df.hist(bins=20, figsize=(10,8))

DataFrame Histogram

We can also plot histograms for specific column(s) by providing the column names:

df[['A', 'B']].hist(bins=20, figsize=(10,8))

Specific Column Histograms

If we have a categorical column in our DataFrame, we can plot the histograms for each category using the by parameter:

df['E'] = np.random.choice(['X', 'Y'], size=(1000,))
df.hist(column='A', by='E', bins=20, figsize=(10,8))

Category-wise Histograms

The hist() function can also be used on grouped data:

g = df.groupby('E')
g.hist(column='A', bins=20, figsize=(10,8))

Grouped Data Histograms

Conclusion

In this tutorial, we have learned about the hist() function in Pandas DataFrame. We have seen how it can be used to visualize the distribution of data in a DataFrame. We have seen how we can plot histograms for specific columns, for category-wise data, and for grouped data.