📜  pip pandas - Python (1)

📅  最后修改于: 2023-12-03 14:45:31.167000             🧑  作者: Mango

pip pandas - Python

Pandas is a powerful and easy-to-use open-source data manipulation and analysis tool for Python. With pandas, you can easily import and export data from a variety of sources, clean and transform data, and perform complex data analysis tasks.

Installation

Pandas can be installed using pip, the Python package manager. To install pandas using pip, open your command prompt and type:

pip install pandas
Importing Pandas

To use pandas in your Python program, you need to import it using the import statement:

import pandas as pd

In this case, we use the alias "pd" to refer to pandas. This is a common convention in the Python community.

Working with DataFrames

One of the core features of pandas is the DataFrame. A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or a SQL table.

Creating a DataFrame

You can create a DataFrame using various methods. The most common approach is to create a DataFrame from a 2-dimensional array/list, a dictionary of one-dimensional arrays/lists, or from a CSV file.

import pandas as pd

# create a DataFrame from a 2-dimensional array
data = [['John', 23], ['Mary', 21], ['Tom', 25]]
df = pd.DataFrame(data, columns=['Name', 'Age'])

# create a DataFrame from a dictionary
data = {'Name': ['John', 'Mary', 'Tom'], 'Age': [23, 21, 25]}
df = pd.DataFrame(data)

# create a DataFrame from a CSV file
df = pd.read_csv('data.csv')
Accessing Data

There are many ways to access data in a DataFrame. You can access a column by its name or position, and you can access a row by its index or position.

import pandas as pd

# create a sample DataFrame
data = {'Name': ['John', 'Mary', 'Tom'], 'Age': [23, 21, 25]}
df = pd.DataFrame(data)

# access a column by its name
print(df['Name'])

# access a column by its position
print(df.iloc[:, 0])

# access a row by its index
print(df.loc[0])

# access a row by its position
print(df.iloc[0])
Manipulating Data

Pandas provides many built-in methods for manipulating data, such as filtering, sorting, grouping, and aggregating. Here are some examples:

import pandas as pd

# create a sample DataFrame
data = {'Name': ['John', 'Mary', 'Tom'], 'Age': [23, 21, 25]}
df = pd.DataFrame(data)

# filter rows by a condition
df2 = df[df['Age'] > 22]

# sort rows by a column
df3 = df.sort_values('Age', ascending=False)

# group rows by a column and calculate the mean of another column
df4 = df.groupby('Name')['Age'].mean()

# aggregate rows by a column and calculate multiple statistics
df5 = df.groupby('Name').agg({'Age': ['min', 'max', 'mean']})
Conclusion

Pandas is a powerful and flexible data analysis tool for Python, with a variety of useful functions and features. With pandas, you can handle and manipulate data easily, making data analysis tasks faster and more efficient.