📜  Python Pandas DataFrame(1)

📅  最后修改于: 2023-12-03 15:18:57.752000             🧑  作者: Mango

Python Pandas DataFrame

Pandas DataFrame is a two-dimensional size-mutable, labeled data structure with columns of potentially different types. It is a core component of the Pandas library for data manipulation and analysis in Python.

Creating a DataFrame

There are several ways to create a Pandas DataFrame. One common way is to create it from a dictionary.

import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie', 'Dave'],
        'age': [25, 30, 35, 40],
        'state': ['NY', 'CA', 'TX', 'FL'],
        'score': [80, 90, 75, 85]}

df = pd.DataFrame(data)

This creates a DataFrame with the columns name, age, state, and score and their corresponding values.

Selecting Data

To select data from a DataFrame, you can use the loc and iloc methods.

# select the first row
df.loc[0]

# select the name column
df['name']

# select the age and score columns
df[['age', 'score']]

# select the rows where age is greater than 30
df.loc[df['age'] > 30]

# select the rows where state is either NY or CA
df.loc[df['state'].isin(['NY', 'CA'])]
Modifying Data

You can modify data in a Pandas DataFrame by assigning new values to specific cells or columns.

# modify the score of the first row to 85
df.loc[0, 'score'] = 85

# add a new column called grade
df['grade'] = ['A', 'A', 'B', 'B']

# remove the state column
df.drop('state', axis=1, inplace=True)
Aggregating Data

Pandas provides several methods for aggregating data in a DataFrame.

# calculate the mean age
df['age'].mean()

# calculate the max score
df['score'].max()

# group by grade and calculate the mean age and mean score for each grade
df.groupby('grade').agg({'age': 'mean', 'score': 'mean'})
Conclusion

Pandas DataFrame is a powerful and flexible tool for data manipulation and analysis in Python. It provides a wide range of features for selecting, modifying, and aggregating data. With its intuitive syntax and rich functionality, it is widely used in data science, machine learning, and other fields that require efficient data processing.