What is the Aggregation Pipeline?
The aggregation pipeline is MongoDB's framework for data transformation and analysis. It processes documents through a sequence of stages, where each stage transforms the data and passes results to the next stage. Think of it as a series of data processing steps chained together -- similar to piping commands in a Unix shell.
Basic Pipeline Structure
An aggregation pipeline is an array of stage objects passed to aggregate():
db.orders.aggregate([
{ $match: { status: "completed" } },
{ $group: { _id: "$category", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } }
])$match - Filtering Documents
The $match stage filters documents, similar to find(). Place it early in the pipeline to reduce the number of documents processed by later stages:
db.orders.aggregate([
{
$match: {
createdAt: { $gte: new Date("2025-01-01") },
status: "completed"
}
}
])$group - Grouping and Aggregating
The $group stage groups documents by an expression and applies accumulators like $sum, $avg, $min, $max, and $count:
// Total revenue and average order value per customer
db.orders.aggregate([
{ $match: { status: "completed" } },
{
$group: {
_id: "$customerId",
totalRevenue: { $sum: "$amount" },
averageOrder: { $avg: "$amount" },
orderCount: { $count: {} },
lastOrder: { $max: "$createdAt" }
}
}
])$project - Reshaping Documents
Use $project to include, exclude, or compute new fields:
db.users.aggregate([
{
$project: {
_id: 0,
fullName: { $concat: ["$firstName", " ", "$lastName"] },
email: 1,
memberSince: {
$dateDiff: {
startDate: "$createdAt",
endDate: new Date(),
unit: "day"
}
}
}
}
])$sort and $limit
Control output ordering and quantity:
// Top 5 customers by total spending
db.orders.aggregate([
{ $match: { status: "completed" } },
{
$group: {
_id: "$customerId",
totalSpent: { $sum: "$amount" }
}
},
{ $sort: { totalSpent: -1 } },
{ $limit: 5 }
])Complete Example
A full pipeline combining multiple stages for a monthly sales report:
db.orders.aggregate([
{ $match: { status: "completed" } },
{
$group: {
_id: {
year: { $year: "$createdAt" },
month: { $month: "$createdAt" }
},
revenue: { $sum: "$amount" },
orders: { $count: {} }
}
},
{
$project: {
_id: 0,
period: {
$concat: [
{ $toString: "$_id.year" }, "-",
{ $toString: "$_id.month" }
]
},
revenue: 1,
orders: 1,
avgOrderValue: { $divide: ["$revenue", "$orders"] }
}
},
{ $sort: { period: -1 } }
])Key Takeaways
- The aggregation pipeline processes documents through ordered stages.
- Place
$matchearly to filter documents before heavy processing. $groupwith accumulators handles grouping and calculations.$projectreshapes documents and computes derived fields.
Try this query in UnifySQL
Write, optimize, and collaborate on MongoDB queries with AI assistance.
Start Free