MongoDB Sharding

Horizontal scaling for massive datasets

🗂️ What is MongoDB Sharding?

Sharding distributes data across multiple servers to handle large datasets and high throughput. It enables horizontal scaling by partitioning data into smaller chunks across different machines.


// Enable sharding on a database
sh.enableSharding("myDatabase")

// Shard a collection
sh.shardCollection("myDatabase.users", { "userId": 1 })
                                    

Output:

{ "ok" : 1 }

Key Sharding Components

🗄️

Shards

Store subsets of sharded data

// Add a shard
sh.addShard("rs0/host:27017")
🧭

Config Servers

Store cluster metadata

// View config
use config
db.shards.find()
🚪

Mongos Router

Routes queries to shards

// Connect via mongos
mongos --configdb config/host:27019
🔑

Shard Key

Determines data distribution

// Choose shard key
{ "country": 1, "userId": 1 }

🔹 Enabling Sharding

Before sharding collections, you must enable sharding at the database level and choose an appropriate shard key for data distribution.

// Connect to mongos
use admin

// Enable sharding for database
sh.enableSharding("ecommerce")

// Shard a collection with hashed key
sh.shardCollection("ecommerce.products", { "_id": "hashed" })

// Shard with compound key
sh.shardCollection("ecommerce.orders", { "customerId": 1, "orderDate": 1 })

Output:

{
  "collectionsharded" : "ecommerce.products",
  "ok" : 1
}

🔹 Choosing a Shard Key

The shard key determines how data is distributed. Choose wisely based on query patterns and data access requirements for optimal performance.

// Hashed shard key (even distribution)
sh.shardCollection("mydb.users", { "email": "hashed" })

// Range-based shard key (ordered data)
sh.shardCollection("mydb.logs", { "timestamp": 1 })

// Compound shard key (multiple fields)
sh.shardCollection("mydb.sales", { "region": 1, "date": 1 })

// Check shard key for collection
db.users.getShardDistribution()

🔹 Managing Shards

Add, remove, and monitor shards to scale your cluster as your data grows and requirements change.

// Add a new shard
sh.addShard("rs1/mongodb1.example.net:27017")

// Remove a shard (drains data first)
use admin
db.runCommand({ removeShard: "rs1" })

// Check shard status
sh.status()

// List all shards
use config
db.shards.find()

Output:

--- Sharding Status ---
  sharding version: { "_id" : 1 }
  shards:
    { "_id" : "rs0", "host" : "rs0/host:27017" }
    { "_id" : "rs1", "host" : "rs1/host:27017" }

🔹 Balancing and Chunks

MongoDB automatically balances data across shards by moving chunks. You can monitor and control this balancing process.

// Check balancer status
sh.getBalancerState()

// Enable/disable balancer
sh.startBalancer()
sh.stopBalancer()

// View chunk distribution
use config
db.chunks.find({ "ns": "mydb.users" }).count()

// Set chunk size (in MB)
use config
db.settings.updateOne(
  { _id: "chunksize" },
  { $set: { value: 128 } },
  { upsert: true }
)

🔹 Monitoring Sharded Clusters

Regular monitoring helps identify performance issues and ensures even data distribution across your sharded cluster.

// View detailed shard status
sh.status()

// Check collection distribution
db.users.getShardDistribution()

// View active migrations
use config
db.locks.find({ "state": { $ne: 0 } })

// Check balancer window
use config
db.settings.find({ _id: "balancer" })

Output:

Shard rs0 at rs0/host:27017
  data: 45.2GB docs: 1000000 chunks: 50
Shard rs1 at rs1/host:27017
  data: 44.8GB docs: 980000 chunks: 48

🧠 Test Your Knowledge

What component routes queries to the appropriate shards?