📜  mongodb group by have - Python (1)

📅  最后修改于: 2023-12-03 15:03:01.177000             🧑  作者: Mango

MongoDB group by in Python

MongoDB is a NoSQL document-oriented database that stores data in collections instead of tables. One of the most powerful features of MongoDB is the ability to group data based on specific criteria using its aggregation framework. In this article, we will explore how to perform group by in MongoDB using Python.

Connecting to MongoDB

Before we can start any operations on MongoDB, we need to establish a connection to the database. We can use the pymongo module in Python to achieve this.

import pymongo

# Establishing a connection to MongoDB
client = pymongo.MongoClient("mongodb://localhost:27017/")
Creating a Sample Dataset

For demonstration purposes, we will create a sample dataset of orders containing the following fields:

  • order_id (int): unique identifier for each order
  • customer_id (int): unique identifier for each customer
  • amount (float): order cost in USD
  • status (str): order status (e.g., "pending", "shipped", "delivered")

We can insert the sample data into a MongoDB collection named orders using the following code:

# Creating a sample dataset
orders = [
    {"order_id": 1, "customer_id": 101, "amount": 10.5, "status": "pending"},
    {"order_id": 2, "customer_id": 102, "amount": 15.0, "status": "shipped"},
    {"order_id": 3, "customer_id": 101, "amount": 7.25, "status": "shipped"},
    {"order_id": 4, "customer_id": 103, "amount": 20.0, "status": "delivered"},
    {"order_id": 5, "customer_id": 102, "amount": 12.75, "status": "delivered"},
    {"order_id": 6, "customer_id": 101, "amount": 8.0, "status": "shipped"},
    {"order_id": 7, "customer_id": 103, "amount": 11.5, "status": "pending"}
]

# Inserting orders into MongoDB
db = client["mydatabase"]
orders_collection = db["orders"]
orders_collection.insert_many(orders)
Performing Group By

To perform a group by operation in MongoDB, we need to use the $group stage in its aggregation pipeline. The $group stage allows us to group documents by one or more fields and perform aggregate functions on them.

In our sample dataset, let's say we want to group the orders by their status and calculate the total sum of amount for each status. We can achieve this using the following code:

# Performing group by on MongoDB
pipeline = [
    {
        "$group": {
            "_id": "$status",
            "total_amount": {"$sum": "$amount"}
        }
    }
]
result = orders_collection.aggregate(pipeline)

# Printing the results
for row in result:
    print(row)

The above code creates a pipeline with a single $group stage that groups the orders by their status field and calculates the total sum of amount for each status. The results are returned in a cursor object, which we can iterate over and print each row.

Conclusion

In this article, we explored how to perform group by in MongoDB using Python. We learned how to establish a connection to MongoDB, create a sample dataset, and perform a group by operation using the $group stage in its aggregation pipeline.