MongoDB Sharding
Horizontal scaling for massive datasets
🗂️ What is MongoDB Sharding?
Sharding distributes data across multiple servers to handle large datasets and high throughput. It enables horizontal scaling by partitioning data into smaller chunks across different machines.
// Enable sharding on a database
sh.enableSharding("myDatabase")
// Shard a collection
sh.shardCollection("myDatabase.users", { "userId": 1 })
Output:
{ "ok" : 1 }
Key Sharding Components
Shards
Store subsets of sharded data
// Add a shard
sh.addShard("rs0/host:27017")
Config Servers
Store cluster metadata
// View config
use config
db.shards.find()
Mongos Router
Routes queries to shards
// Connect via mongos
mongos --configdb config/host:27019
Shard Key
Determines data distribution
// Choose shard key
{ "country": 1, "userId": 1 }
🔹 Enabling Sharding
Before sharding collections, you must enable sharding at the database level and choose an appropriate shard key for data distribution.
// Connect to mongos
use admin
// Enable sharding for database
sh.enableSharding("ecommerce")
// Shard a collection with hashed key
sh.shardCollection("ecommerce.products", { "_id": "hashed" })
// Shard with compound key
sh.shardCollection("ecommerce.orders", { "customerId": 1, "orderDate": 1 })
Output:
{
"collectionsharded" : "ecommerce.products",
"ok" : 1
}
🔹 Choosing a Shard Key
The shard key determines how data is distributed. Choose wisely based on query patterns and data access requirements for optimal performance.
// Hashed shard key (even distribution)
sh.shardCollection("mydb.users", { "email": "hashed" })
// Range-based shard key (ordered data)
sh.shardCollection("mydb.logs", { "timestamp": 1 })
// Compound shard key (multiple fields)
sh.shardCollection("mydb.sales", { "region": 1, "date": 1 })
// Check shard key for collection
db.users.getShardDistribution()
🔹 Managing Shards
Add, remove, and monitor shards to scale your cluster as your data grows and requirements change.
// Add a new shard
sh.addShard("rs1/mongodb1.example.net:27017")
// Remove a shard (drains data first)
use admin
db.runCommand({ removeShard: "rs1" })
// Check shard status
sh.status()
// List all shards
use config
db.shards.find()
Output:
--- Sharding Status ---
sharding version: { "_id" : 1 }
shards:
{ "_id" : "rs0", "host" : "rs0/host:27017" }
{ "_id" : "rs1", "host" : "rs1/host:27017" }
🔹 Balancing and Chunks
MongoDB automatically balances data across shards by moving chunks. You can monitor and control this balancing process.
// Check balancer status
sh.getBalancerState()
// Enable/disable balancer
sh.startBalancer()
sh.stopBalancer()
// View chunk distribution
use config
db.chunks.find({ "ns": "mydb.users" }).count()
// Set chunk size (in MB)
use config
db.settings.updateOne(
{ _id: "chunksize" },
{ $set: { value: 128 } },
{ upsert: true }
)
🔹 Monitoring Sharded Clusters
Regular monitoring helps identify performance issues and ensures even data distribution across your sharded cluster.
// View detailed shard status
sh.status()
// Check collection distribution
db.users.getShardDistribution()
// View active migrations
use config
db.locks.find({ "state": { $ne: 0 } })
// Check balancer window
use config
db.settings.find({ _id: "balancer" })
Output:
Shard rs0 at rs0/host:27017 data: 45.2GB docs: 1000000 chunks: 50 Shard rs1 at rs1/host:27017 data: 44.8GB docs: 980000 chunks: 48