📜  pandas df multiindex - Python (1)

📅  最后修改于: 2023-12-03 14:45:02.503000             🧑  作者: Mango

Pandas MultiIndex DataFrames

Pandas DataFrame MultiIndex is a powerful way to work with data that has multiple levels of indexes. It allows us to manipulate and analyze data in a structured and organized way. MultiIndex is especially useful when working with time-series data, financial data, or any data that has a hierarchical structure.

Creating a MultiIndex DataFrame

We can create a MultiIndex DataFrame by passing a list of index levels to the index parameter of the DataFrame constructor. Here is an example:

import pandas as pd

# Create a DataFrame with MultiIndex
df = pd.DataFrame(
    data={"sales": [100, 200, 150, 300, 250, 350],
          "expenses": [50, 80, 70, 100, 90, 120]},
    index=pd.MultiIndex.from_tuples(
        [("Q1", "January"), ("Q1", "February"), ("Q1", "March"),
         ("Q2", "April"), ("Q2", "May"), ("Q2", "June")],
        names=["Quarter", "Month"]
    )
)

In this example, we created a DataFrame with two index levels, Quarter and Month. The sales and expenses columns represent the data for this DataFrame.

Indexing a MultiIndex DataFrame

To access a specific element in a MultiIndex DataFrame, we can use the .loc accessor and pass in the values for each level of the index. Here's an example:

# Get the sales for Q1, January
sales_q1_jan = df.loc[("Q1", "January"), "sales"]

print(sales_q1_jan)

Output:

100

We can also use the .loc accessor to slice out a range of rows and columns based on the index levels. Here's an example:

# Get the sales for Q1
sales_q1 = df.loc["Q1", "sales"]

print(sales_q1)

Output:

Month
January     100
February    200
March       150
Name: sales, dtype: int64

In this example, we returned all the sales values for the Q1 quarter.

Aggregating Data in MultiIndex DataFrames

We can aggregate data in a MultiIndex DataFrame using the .groupby() method. Here's an example:

# Get the total sales and expenses for each quarter
quarterly_totals = df.groupby("Quarter").sum()

print(quarterly_totals)

Output:

         sales  expenses
Quarter                 
Q1         450       200
Q2         900       310

In this example, we used the .groupby() method to group the data by Quarter and then calculated the sum of sales and expenses for each group.

Conclusion

Pandas MultiIndex DataFrames provide an organized and structured way to work with hierarchical data. We can easily access and manipulate the data using the .loc accessor and aggregate the data using the .groupby() method. Pandas MultiIndex DataFrames are a powerful tool for working with complex data.