Database
NoSQL
MongoDB
Aggregation
$group (aggregation)

$group (Aggregation) in MongoDB

Overview

The $group stage in MongoDB's Aggregation Pipeline is used to group documents by specified field(s) and perform aggregate calculations like sum, average, count, minimum, and maximum on the grouped data. This stage is essential for data aggregation and analysis, allowing you to reshape documents into summarized forms.

Key Features

  • Grouping: $group collects documents by the specified key or keys and groups them into a single document per group.
  • Aggregation Functions: Supports various aggregation expressions such as $sum, $avg, $min, $max, $push, $addToSet, and $first, which can be applied to fields within the grouped documents.
  • Custom Fields: You can create new fields within the grouped output based on computed values from the documents in each group.

Common Use Cases

  1. Summarizing Data: Calculate totals, averages, or other statistical measures for grouped data, such as total sales by product or average grades by student.

  2. Counting Documents: Count the number of documents that match specific criteria, such as counting orders by status or user registrations per month.

  3. Categorical Analysis: Group data by categorical fields, such as grouping sales data by region, product category, or customer type.

  4. Data Transformation: Reshape documents by grouping and performing computations, creating summarized outputs for reporting or further processing.

How It Works

  • Group Key: The _id field in the $group stage specifies the grouping key, which determines how documents are grouped together. This can be a single field, a combination of fields, or even a computed value.
  • Aggregation Expressions: You can apply aggregation expressions to calculate values for the grouped documents, such as $sum to add up values or $avg to compute the average.
  • Complex Grouping: $group can use nested fields, expressions, or computed fields for grouping, allowing for sophisticated and customized aggregations.

Important Considerations

  • Performance Impact: $group can be computationally intensive, especially on large collections. Using indexes and filtering data early in the pipeline with $match can improve performance.
  • Memory Usage: Large groupings can consume significant memory. In MongoDB, there is a memory limit for the in-memory sort stage, so consider using the allowDiskUse option if memory usage is a concern.
  • Unique Grouping Keys: If the grouping key is unique for each document, the $group stage will generate as many groups as there are documents, which may lead to performance degradation.

Summary

The $group stage is a powerful tool in MongoDB's Aggregation Pipeline, enabling the transformation and summarization of data by grouping documents and applying aggregation expressions. Whether you're computing totals, averages, or other aggregate measures, $group provides the functionality needed for comprehensive data analysis and reporting, making it essential for any data-driven application in MongoDB.