📜  shuffle rows dataframe (1)

📅  最后修改于: 2023-12-03 14:47:27.190000             🧑  作者: Mango

Shuffle Rows in Pandas Dataframe

Pandas is a popular data manipulation library in Python, used extensively in data science and data analysis. One of the fundamental operations in data science is shuffling rows in a dataframe. In this guide, we will discuss how to shuffle rows in a Pandas dataframe using the sample function.

What is Shuffling Rows in a Dataframe?

Shuffling rows in a dataframe refers to randomly reordering the rows of a dataframe. This operation is commonly used in data preprocessing to randomize the order of the data before splitting it into training and test sets or for any other operation where randomness is required.

How to Shuffle Rows in Pandas Dataframe?

To shuffle rows in a Pandas dataframe, we can use the sample function. This function randomly samples a given number of rows from a dataframe without replacement. The syntax for the sample function is as follows:

df.sample(n=None, frac=None, replace=False, random_state=None, axis=None)
  • n: number of rows to be sampled
  • frac: proportion of rows to be sampled
  • replace: if True, allows sampling of the same row multiple times
  • random_state: seed for the random number generator
  • axis: axis along which to sample (0 for rows, 1 for columns)

If both n and frac are None, the function samples all rows in the dataframe. If both n and frac are not None, n takes precedence over frac.

Here's an example of how to shuffle rows in a Pandas dataframe:

import pandas as pd

# create a sample dataframe
df = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
                   'age': [20, 30, 25, 22, 28],
                   'gender': ['female', 'male', 'male', 'male', 'female']})

# shuffle the rows of the dataframe
shuffled_df = df.sample(frac=1)

print(shuffled_df)

The shuffle_df variable here contains the shuffled dataframe.

Conclusion

Shuffling rows in a Pandas dataframe is a crucial operation in data preprocessing. Using the sample function, we can quickly shuffle the rows of a Pandas dataframe in a few lines of code.