What is the Aggregation Pipeline?

The aggregation pipeline is MongoDB's framework for data transformation and analysis. It processes documents through a sequence of stages, where each stage transforms the data and passes results to the next stage. Think of it as a series of data processing steps chained together -- similar to piping commands in a Unix shell.

Basic Pipeline Structure

An aggregation pipeline is an array of stage objects passed to aggregate():

db.orders.aggregate([
  { $match: { status: "completed" } },
  { $group: { _id: "$category", total: { $sum: "$amount" } } },
  { $sort: { total: -1 } }
])

$match - Filtering Documents

The $match stage filters documents, similar to find(). Place it early in the pipeline to reduce the number of documents processed by later stages:

db.orders.aggregate([
  {
    $match: {
      createdAt: { $gte: new Date("2025-01-01") },
      status: "completed"
    }
  }
])

$group - Grouping and Aggregating

The $group stage groups documents by an expression and applies accumulators like $sum, $avg, $min, $max, and $count:

// Total revenue and average order value per customer
db.orders.aggregate([
  { $match: { status: "completed" } },
  {
    $group: {
      _id: "$customerId",
      totalRevenue: { $sum: "$amount" },
      averageOrder: { $avg: "$amount" },
      orderCount: { $count: {} },
      lastOrder: { $max: "$createdAt" }
    }
  }
])

$project - Reshaping Documents

Use $project to include, exclude, or compute new fields:

db.users.aggregate([
  {
    $project: {
      _id: 0,
      fullName: { $concat: ["$firstName", " ", "$lastName"] },
      email: 1,
      memberSince: {
        $dateDiff: {
          startDate: "$createdAt",
          endDate: new Date(),
          unit: "day"
        }
      }
    }
  }
])

$sort and $limit

Control output ordering and quantity:

// Top 5 customers by total spending
db.orders.aggregate([
  { $match: { status: "completed" } },
  {
    $group: {
      _id: "$customerId",
      totalSpent: { $sum: "$amount" }
    }
  },
  { $sort: { totalSpent: -1 } },
  { $limit: 5 }
])

Complete Example

A full pipeline combining multiple stages for a monthly sales report:

db.orders.aggregate([
  { $match: { status: "completed" } },
  {
    $group: {
      _id: {
        year: { $year: "$createdAt" },
        month: { $month: "$createdAt" }
      },
      revenue: { $sum: "$amount" },
      orders: { $count: {} }
    }
  },
  {
    $project: {
      _id: 0,
      period: {
        $concat: [
          { $toString: "$_id.year" }, "-",
          { $toString: "$_id.month" }
        ]
      },
      revenue: 1,
      orders: 1,
      avgOrderValue: { $divide: ["$revenue", "$orders"] }
    }
  },
  { $sort: { period: -1 } }
])

Key Takeaways

The aggregation pipeline processes documents through ordered stages.
Place $match early to filter documents before heavy processing.
$group with accumulators handles grouping and calculations.
$project reshapes documents and computes derived fields.

Try this query in UnifySQL

Write, optimize, and collaborate on MongoDB queries with AI assistance.

Start Free