MongoDB Schema Design
Design efficient and scalable document structures
📐 What is Schema Design?
MongoDB schema design involves structuring your documents and collections for optimal performance and scalability. Unlike relational databases, MongoDB offers flexible schemas with embedded documents and references for different use cases.
// Embedded document example
{
name: "John Doe",
address: { street: "123 Main St", city: "NYC" }
}
Result:
Single document with nested address data
Key Design Patterns
Embedding
Store related data together
{ user: { name: "Alice", email: "..." } }
Referencing
Link documents with IDs
{ userId: ObjectId("...") }
Denormalization
Duplicate data for speed
{ authorName: "John", authorId: "..." }
Bucketing
Group time-series data
{ hour: "2024-01", readings: [...] }
🔹 Embedding vs Referencing
Choose between embedding related data within documents or referencing separate documents. Embedding provides faster reads with single queries, while referencing offers better data consistency and smaller documents.
// EMBEDDING - One-to-Few relationship
// Good for: Data accessed together, infrequent updates
{
_id: 1,
name: "John Doe",
addresses: [
{ street: "123 Main St", city: "NYC", type: "home" },
{ street: "456 Work Ave", city: "NYC", type: "work" }
]
}
// REFERENCING - One-to-Many relationship
// Good for: Large related data, frequent updates
// Users collection
{
_id: 1,
name: "John Doe"
}
// Orders collection
{
_id: 101,
userId: 1,
items: [...],
total: 299.99
}
// Query with reference
const user = db.users.findOne({ _id: 1 });
const orders = db.orders.find({ userId: user._id });
When to Use:
Embed: 1-to-few, data read together | Reference: 1-to-many, independent updates
🔹 One-to-One Relationships
Model one-to-one relationships by embedding the related document directly. This approach provides the best performance for data that's always accessed together and has a clear ownership relationship.
// Embed one-to-one data
// User with profile (always accessed together)
{
_id: ObjectId("..."),
username: "johndoe",
email: "[email protected]",
profile: {
firstName: "John",
lastName: "Doe",
bio: "Software developer",
avatar: "avatar.jpg",
birthDate: ISODate("1990-01-15")
},
settings: {
theme: "dark",
notifications: true,
language: "en"
}
}
// Alternative: Reference for large or rarely accessed data
// User document
{
_id: ObjectId("user123"),
username: "johndoe",
profileId: ObjectId("profile456")
}
// Profile document (separate if very large)
{
_id: ObjectId("profile456"),
userId: ObjectId("user123"),
detailedBio: "Very long biography...",
portfolio: [/* large array */]
}
Best Practice:
Embed for small, frequently accessed data. Reference for large or rarely used data.
🔹 One-to-Many Relationships
Handle one-to-many relationships based on the "many" side's size. Embed for small arrays, reference for large collections. Consider the access patterns and update frequency when choosing.
// ONE-TO-FEW: Embed (< 100 items)
// Blog post with comments
{
_id: ObjectId("post123"),
title: "MongoDB Schema Design",
content: "...",
comments: [
{ user: "Alice", text: "Great post!", date: ISODate() },
{ user: "Bob", text: "Very helpful", date: ISODate() }
]
}
// ONE-TO-MANY: Reference (100-1000s items)
// Author with books
// Authors collection
{
_id: ObjectId("author123"),
name: "Jane Smith",
bio: "Bestselling author"
}
// Books collection
{
_id: ObjectId("book456"),
title: "MongoDB Mastery",
authorId: ObjectId("author123"),
isbn: "978-1234567890"
}
// ONE-TO-SQUILLIONS: Reference with parent ref
// Product with reviews (millions)
// Products collection
{
_id: ObjectId("prod123"),
name: "Laptop",
reviewCount: 15420
}
// Reviews collection (parent reference)
{
_id: ObjectId("rev789"),
productId: ObjectId("prod123"),
rating: 5,
text: "Excellent!"
}
Guidelines:
Few (< 100): Embed | Many (100-1000s): Reference | Millions: Reference with parent ID
🔹 Many-to-Many Relationships
Model many-to-many relationships using arrays of references or a separate junction collection. Choose based on which side you query more frequently and the relationship's complexity.
// APPROACH 1: Array of References (Simple)
// Students collection
{
_id: ObjectId("student123"),
name: "Alice",
courseIds: [
ObjectId("course1"),
ObjectId("course2"),
ObjectId("course3")
]
}
// Courses collection
{
_id: ObjectId("course1"),
name: "MongoDB Basics",
studentIds: [
ObjectId("student123"),
ObjectId("student456")
]
}
// APPROACH 2: Junction Collection (Complex)
// Students collection
{
_id: ObjectId("student123"),
name: "Alice"
}
// Courses collection
{
_id: ObjectId("course1"),
name: "MongoDB Basics"
}
// Enrollments collection (with metadata)
{
_id: ObjectId("enroll789"),
studentId: ObjectId("student123"),
courseId: ObjectId("course1"),
enrollDate: ISODate("2024-01-15"),
grade: "A",
status: "completed"
}
// Query: Find all courses for a student
db.enrollments.find({ studentId: ObjectId("student123") })
When to Use:
Simple: Array of IDs | Complex: Junction collection with metadata
🔹 Denormalization Pattern
Duplicate frequently accessed data to avoid joins and improve read performance. Accept some data redundancy in exchange for faster queries and simpler application code.
// WITHOUT Denormalization (Multiple queries)
// Users collection
{ _id: 1, name: "John Doe", email: "[email protected]" }
// Posts collection
{ _id: 101, userId: 1, title: "My Post", content: "..." }
// Need 2 queries to display post with author name
// WITH Denormalization (Single query)
// Posts collection with embedded author info
{
_id: 101,
title: "My Post",
content: "...",
author: {
id: 1,
name: "John Doe", // Duplicated from users
avatar: "avatar.jpg" // Duplicated from users
},
createdAt: ISODate()
}
// Single query gets everything
db.posts.find({ _id: 101 })
// Trade-off: Update author name in multiple places
db.posts.updateMany(
{ "author.id": 1 },
{ $set: { "author.name": "John Smith" } }
)
Best For:
Read-heavy applications where data rarely changes
🔹 Bucketing Pattern
Group time-series or sequential data into buckets to reduce document count and improve query performance. Ideal for IoT sensors, logs, and analytics data with high write volumes.
// WITHOUT Bucketing (One doc per reading)
// 1 million documents for 1 million readings
{ sensorId: "A1", temp: 22.5, time: ISODate("2024-01-01T10:00:00") }
{ sensorId: "A1", temp: 22.7, time: ISODate("2024-01-01T10:01:00") }
// ... millions more
// WITH Bucketing (Group by hour)
// 24 documents per day instead of 1440
{
_id: ObjectId("..."),
sensorId: "A1",
date: ISODate("2024-01-01"),
hour: 10,
readings: [
{ minute: 0, temp: 22.5 },
{ minute: 1, temp: 22.7 },
{ minute: 2, temp: 22.6 },
// ... 60 readings per hour
],
avgTemp: 22.6,
minTemp: 22.4,
maxTemp: 22.9,
count: 60
}
// Query hourly data efficiently
db.sensors.find({
sensorId: "A1",
date: ISODate("2024-01-01"),
hour: { $gte: 10, $lte: 12 }
})
// Benefits:
// - Fewer documents (better index performance)
// - Pre-calculated aggregates
// - Efficient range queries
Use Cases:
IoT sensors, server logs, financial ticks, analytics events
🔹 Schema Design Best Practices
Follow these proven guidelines to create efficient, scalable MongoDB schemas. Consider your application's access patterns, query frequency, and data relationships when making design decisions.
✅ Design Principles:
- Design for your queries: Structure data how you'll access it
- Embed for atomicity: Keep related data together for atomic updates
- Reference for flexibility: Separate data that grows unbounded
- Denormalize for reads: Duplicate data to avoid joins
- Consider document size: Keep under 16MB limit
- Use arrays wisely: Limit array growth (< 1000 items)
📏 Document Size Guidelines:
- Maximum: 16MB per document
- Recommended: < 1MB for best performance
- Arrays: < 1000 elements (use references for more)
- Nesting: < 100 levels deep
// Good schema example
{
_id: ObjectId("..."),
// Frequently accessed together - embedded
user: {
name: "Alice",
email: "[email protected]"
},
// Small array - embedded
tags: ["mongodb", "database", "nosql"],
// Reference to large collection
categoryId: ObjectId("..."),
// Denormalized for display
categoryName: "Databases",
// Metadata
createdAt: ISODate(),
updatedAt: ISODate()
}