📜  pandas distinct - Python (1)

📅  最后修改于: 2023-12-03 14:45:02.542000             🧑  作者: Mango

Pandas distinct - Python

Pandas is a popular Python library used for data manipulation and analysis. In this article, we will explore the distinct function in Pandas.

What is distinct function in Pandas?

The distinct function in Pandas is used to find unique values in a column or across multiple columns of a DataFrame. It is similar to the DISTINCT keyword in SQL queries.

Syntax

The syntax of distinct function is:

DataFrame.drop_duplicates(subset=None, keep='first', inplace=False)
Parameters

The distinct function takes three parameters:

  • subset: This parameter is optional and specifies the columns that should be used to find unique values. By default, it takes all columns.

  • keep: This parameter is optional and specifies which occurrence of a duplicate value should be kept. Possible values are first, last, and False (which removes all occurrences of the duplicate value).

  • inplace: This parameter is optional and specifies whether to modify the original DataFrame or return a new DataFrame with the unique values.

Examples

Let's see some examples of using the distinct function in Pandas.

Example 1: Finding unique values in a single column
import pandas as pd

# create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Peter', 'John', 'David', 'Mary']})

# find unique values in the 'Name' column
unique_names = df['Name'].drop_duplicates()

print(unique_names)

Output:

0     John
1     Mary
2    Peter
4    David
Name: Name, dtype: object
Example 2: Finding unique values across multiple columns
import pandas as pd

# create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Peter', 'David'],
                   'City': ['New York', 'London', 'Paris', 'London']})

# find unique values across the 'Name' and 'City' columns
unique_values = df.drop_duplicates()

print(unique_values)

Output:

    Name      City
0   John  New York
1   Mary    London
2  Peter     Paris
3  David    London
Example 3: Removing duplicates
import pandas as pd

# create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Peter', 'John', 'David', 'Mary']})

# remove duplicates from the 'Name' column
df.drop_duplicates(subset=['Name'], inplace=True)

print(df)

Output:

    Name
0   John
1   Mary
2  Peter
4  David
Conclusion

The distinct function in Pandas is a useful tool for finding unique values in a DataFrame. It can be used to filter out duplicates or to perform other analyses on unique values.