Advanced MongoDB

MongoDB Schema Design Patterns

Learn embedding vs referencing, polymorphic patterns, and the bucket pattern for time-series data.

8 min read Tutorial

Schema Design in MongoDB

Unlike relational databases, MongoDB does not enforce a fixed schema. This flexibility is powerful but requires thoughtful design. The key decision in MongoDB schema design is whether to embed related data within a single document or reference it across collections. The right choice depends on your application's access patterns, data relationships, and performance requirements.

Embedding vs Referencing

Embedding places related data inside the same document. Referencing stores a foreign key and requires a second query or $lookup:

// EMBEDDED: Address lives inside the user document
{
  _id: ObjectId("..."),
  name: "Alice",
  address: {
    street: "123 Main St",
    city: "San Francisco",
    state: "CA"
  }
}

// REFERENCED: Address in a separate collection
// users collection
{ _id: ObjectId("u1"), name: "Alice", addressId: ObjectId("a1") }

// addresses collection
{ _id: ObjectId("a1"), street: "123 Main St", city: "San Francisco" }

Embed when: Data is read together, the relationship is one-to-one or one-to-few, and the embedded data does not grow unboundedly.

Reference when: Data is large, frequently updated independently, shared across many documents, or the relationship is many-to-many.

The Subset Pattern

When an embedded array could grow very large, keep a subset of the most relevant items in the main document and store the full history in a separate collection:

// Product with only the 10 most recent reviews embedded
{
  _id: ObjectId("..."),
  name: "Wireless Mouse",
  price: 29.99,
  recentReviews: [
    { user: "Alice", rating: 5, text: "Great!", date: ISODate("2025-12-01") },
    { user: "Bob", rating: 4, text: "Good value", date: ISODate("2025-11-28") }
    // ... up to 10 reviews
  ],
  totalReviews: 347
}

// Full reviews in a separate collection
// db.reviews: { productId: ObjectId("..."), user: "...", rating: 5, ... }

The Polymorphic Pattern

Store documents with different shapes in the same collection, distinguished by a type field. This simplifies queries that span multiple entity types:

// Notifications collection with different types
{ type: "email", to: "alice@example.com", subject: "Welcome", sentAt: ISODate("...") }
{ type: "sms", phone: "+1234567890", message: "Your code: 1234", sentAt: ISODate("...") }
{ type: "push", deviceToken: "abc123", title: "New message", sentAt: ISODate("...") }

// Query all notifications regardless of type
db.notifications.find({ sentAt: { $gte: new Date("2025-01-01") } })

The Bucket Pattern

For time-series or high-frequency data, group measurements into "buckets" to reduce document count and improve query performance:

// Instead of one document per sensor reading...
// Group readings into hourly buckets
{
  sensorId: "temp-001",
  date: ISODate("2025-06-15T14:00:00Z"),
  readings: [
    { minute: 0, value: 22.5 },
    { minute: 1, value: 22.6 },
    { minute: 2, value: 22.4 }
    // ... up to 60 readings per hour
  ],
  count: 60,
  sum: 1350.0,
  avg: 22.5,
  min: 22.1,
  max: 23.0
}

This reduces the number of documents from millions (one per reading) to thousands (one per hour per sensor), while pre-aggregating common statistics.

The Extended Reference Pattern

When referencing, duplicate a few frequently-accessed fields from the referenced document to avoid extra lookups:

// Order with extended reference to customer
{
  _id: ObjectId("..."),
  items: [...],
  total: 149.99,
  customer: {
    _id: ObjectId("c1"),
    name: "Alice Johnson",   // duplicated for fast access
    email: "alice@example.com" // duplicated for fast access
  }
}
// Full customer data still lives in the customers collection

Key Takeaways

  • Design your schema around your application's query patterns, not the data structure alone.
  • Embed data that is read together and does not grow unboundedly.
  • Reference data that is large, shared, or frequently updated independently.
  • Use patterns like subset, bucket, and extended reference to optimize for specific workloads.

Try this query in UnifySQL

Write, optimize, and collaborate on MongoDB queries with AI assistance.

Start Free