📜  spark df shape - Python (1)

📅  最后修改于: 2023-12-03 15:20:11.606000             🧑  作者: Mango

Spark DataFrame Shape - Python

Introduction

As a programmer working with big data, Spark DataFrame has become a popular choice for its powerful capabilities in handling structured and semi-structured data. One of the fundamental operations when working with data is understanding the shape of the data. To facilitate this, Spark provides the df.shape method to easily obtain the shape of a DataFrame in Python.

Syntax

Here's the basic syntax for using df.shape:

df.shape
Parameters

This method takes no parameters.

Returns

The df.shape method returns a tuple of two integers representing the number of rows and columns in the DataFrame, respectively.

Example

Let's suppose we have a DataFrame df with the following data:

| name | age | gender | |--------|-----|--------| | Alice | 25 | F | | Bob | 30 | M | | Claire | 35 | F | | David | 40 | M |

We can use df.shape to obtain the shape of the DataFrame:

shape = df.shape
print(shape)

Output:

(4, 3)

This tells us that the DataFrame df has 4 rows and 3 columns.

Conclusion

In this article, we discussed the df.shape method in Spark DataFrame for Python, which is a convenient way to obtain the shape of a DataFrame. The method returns a tuple of two integers representing the number of rows and columns in the DataFrame, respectively. Knowing the shape of a DataFrame is an important first step in performing various operations on the DataFrame.