Advanced MongoDB

Advanced Aggregation: $lookup, $unwind, $facet

Perform joins, flatten arrays, and run parallel pipelines for complex data analysis.

9 min read Tutorial

Beyond Basic Aggregation

While $match, $group, and $project handle most use cases, advanced stages like $lookup, $unwind, and $facet unlock powerful data processing capabilities -- from cross-collection joins to parallel sub-pipelines.

$lookup - Joining Collections

The $lookup stage performs a left outer join between two collections, similar to a SQL LEFT JOIN:

// Join orders with customer details
db.orders.aggregate([
  {
    $lookup: {
      from: "customers",        // collection to join
      localField: "customerId", // field in orders
      foreignField: "_id",      // field in customers
      as: "customer"            // output array field
    }
  },
  { $unwind: "$customer" },     // flatten the array to a single object
  {
    $project: {
      orderTotal: 1,
      "customer.name": 1,
      "customer.email": 1
    }
  }
])

$lookup with Pipeline

For more control, use the pipeline form of $lookup to add filtering and transformation within the join:

db.customers.aggregate([
  {
    $lookup: {
      from: "orders",
      let: { custId: "$_id" },
      pipeline: [
        { $match: { $expr: { $eq: ["$customerId", "$$custId"] } } },
        { $match: { status: "completed" } },
        { $sort: { createdAt: -1 } },
        { $limit: 5 }
      ],
      as: "recentOrders"
    }
  }
])
// Each customer now has their 5 most recent completed orders

$unwind - Deconstructing Arrays

The $unwind stage deconstructs an array field, creating one document per array element. This is essential after $lookup or for analyzing individual array entries:

// Count how many times each tag is used across articles
db.articles.aggregate([
  { $unwind: "$tags" },
  { $group: { _id: "$tags", count: { $sum: 1 } } },
  { $sort: { count: -1 } }
])

// Preserve documents with empty/missing arrays
db.articles.aggregate([
  {
    $unwind: {
      path: "$tags",
      preserveNullAndEmptyArrays: true
    }
  }
])

$facet - Parallel Pipelines

The $facet stage runs multiple aggregation pipelines in parallel on the same set of input documents and returns their results as separate fields:

db.products.aggregate([
  { $match: { inStock: true } },
  {
    $facet: {
      // Pipeline 1: Price statistics
      priceStats: [
        {
          $group: {
            _id: null,
            avgPrice: { $avg: "$price" },
            minPrice: { $min: "$price" },
            maxPrice: { $max: "$price" }
          }
        }
      ],
      // Pipeline 2: Count by category
      byCategory: [
        { $group: { _id: "$category", count: { $sum: 1 } } },
        { $sort: { count: -1 } }
      ],
      // Pipeline 3: Top 5 most expensive
      topProducts: [
        { $sort: { price: -1 } },
        { $limit: 5 },
        { $project: { name: 1, price: 1 } }
      ]
    }
  }
])

This returns a single document with three arrays: priceStats, byCategory, and topProducts. This is particularly useful for building dashboard data in a single query.

Combining Stages

A real-world pipeline often combines all three stages. Here is an example that builds an order summary with customer details:

db.orders.aggregate([
  { $match: { createdAt: { $gte: new Date("2025-01-01") } } },
  { $unwind: "$items" },
  {
    $group: {
      _id: "$customerId",
      totalSpent: { $sum: { $multiply: ["$items.price", "$items.qty"] } },
      itemCount: { $sum: "$items.qty" }
    }
  },
  {
    $lookup: {
      from: "customers",
      localField: "_id",
      foreignField: "_id",
      as: "customer"
    }
  },
  { $unwind: "$customer" },
  { $sort: { totalSpent: -1 } },
  { $limit: 10 },
  {
    $project: {
      _id: 0,
      customerName: "$customer.name",
      totalSpent: 1,
      itemCount: 1
    }
  }
])

Key Takeaways

  • $lookup performs left outer joins between collections.
  • The pipeline form of $lookup allows filtering and sorting within the join.
  • $unwind flattens arrays into individual documents for per-element processing.
  • $facet runs multiple pipelines in parallel on the same data set.

Try this query in UnifySQL

Write, optimize, and collaborate on MongoDB queries with AI assistance.

Start Free