📜  shuffle dataframe python (1)

📅  最后修改于: 2023-12-03 14:47:27.188000             🧑  作者: Mango

Shuffle DataFrame in Python

Shuffling a DataFrame means to randomly change the order of the rows. This can be useful in situations where you want to randomly sample or create train and test sets from a dataset. In this tutorial, we will explore different ways of shuffling a DataFrame in Python.

Using the shuffle function from the random module

The random module in Python has a shuffle function that can be used to randomly reorder a list. We can use this function to shuffle the index of a DataFrame and then use the loc accessor to extract the rows in the shuffled order.

import random
import pandas as pd

# create a small DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

# shuffle the index
index = list(df.index)
random.shuffle(index)

# extract rows in shuffled order
shuffled_df = df.loc[index]

print(shuffled_df)

This will output:

   A  B  C
2  3  6  9
1  2  5  8
0  1  4  7

Note that we first converted the DataFrame index to a list and shuffled it using the shuffle function. Then we used the shuffled index to select the rows from the original DataFrame.

Using the sample function

Another way to shuffle a DataFrame is to use the sample function. This function randomly selects a specified number of rows from a DataFrame. By setting the frac parameter to 1.0, we can select all the rows of the DataFrame in a random order.

shuffled_df = df.sample(frac=1.0)

print(shuffled_df)

This will output:

   A  B  C
2  3  6  9
1  2  5  8
0  1  4  7
Using the numpy.random.permutation function

The numpy.random.permutation function can also be used to shuffle a DataFrame. This function shuffles an array by returning a permuted range of indices.

import numpy as np

# shuffle the index
index = np.random.permutation(df.index)

# extract rows in shuffled order
shuffled_df = df.loc[index]

print(shuffled_df)

This will output:

   A  B  C
2  3  6  9
1  2  5  8
0  1  4  7
Conclusion

In this tutorial, we explored different ways of shuffling a DataFrame in Python. We used the shuffle function from the random module, the sample function, and the numpy.random.permutation function. Shuffling a DataFrame is useful when we want to randomly sample or create train and test sets from a dataset.