📜  Pandas DataFrame.Groupby(1)

📅  最后修改于: 2023-12-03 15:18:13.650000             🧑  作者: Mango

Pandas DataFrame Groupby

The groupby() function in Pandas DataFrame is used to split the data into groups based on one or more criteria and perform some operations on these groups. It is one of the most powerful and commonly used features in Pandas, as it allows for easy data aggregation and analysis.

Syntax
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False)
Parameters
  • by: Specifies the column name(s) or other keys to group the DataFrame. This can be a single column name, a list of column names, a function, or a combination of all. It is a required parameter.

  • axis: Specifies the axis to perform grouping along. 0 for index-based grouping (default) and 1 for column-based grouping.

  • level: Specifies the level (name or number) of the MultiIndex to group by.

  • as_index: Specifies whether to return the grouped columns as the index in the resulting DataFrame or as normal columns. Default is True.

  • sort: Specifies whether to sort the resulting group keys. Default is True.

  • group_keys: Specifies whether to add group keys to the index of the resulting DataFrame. Default is True.

  • squeeze: Specifies whether to reduce dimensionality of the returned object if possible. Default is False.

  • observed: Specifies whether to validate the values in the input DataFrame. Default is False.

Returns

A DataFrameGroupBy object, which represents a collection of DataFrames split based on the specified grouping.

Grouping Operations

Once a DataFrame is grouped using groupby(), we can perform various operations on the grouped data.

Aggregation

We can perform aggregation operations on the groups using functions like sum(), mean(), max(), min(), etc.

# Grouping by a single column and calculating the sum of another column
df.groupby('category')['value'].sum()

# Grouping by multiple columns and calculating the mean value of another column
df.groupby(['col1', 'col2'])['value'].mean()
Transformation

Using transform(), we can apply transformation functions to each group separately.

# Grouping by a column and applying the transformation function on another column
df['mean_value'] = df.groupby('category')['value'].transform('mean')
Filtering

We can filter the groups based on some condition using the filter() function.

# Grouping by a column and returning groups where the sum of value column is greater than 10
df.groupby('category').filter(lambda x: x['value'].sum() > 10)
Iteration

We can iterate over the groups using the groupby() object.

# Iterating over the groups and printing each group
for group, data in df.groupby('category'):
    print(f"Group: {group}")
    print(data)
Reshaping

Grouping allows us to reshape the data and pivot it using functions such as pivot_table() and stack().

# Creating a pivot table from a grouped DataFrame
pivot_df = df.groupby(['col1', 'col2'])['value'].mean().reset_index().pivot_table(index='col1', columns='col2', values='value')

These are just a few examples of the operations that can be performed with groupby(). Pandas groupby() is a versatile and powerful tool that helps in analyzing and transforming data efficiently.